Introduction
Contents
Introduction¶
Welcome to the hands on materials for Data Science in Practice.
This notebook will guide through getting the tools you will need for working with these tutorials and assignments.
Alerts¶
Throughout these tutorials, you will see colored ‘alert’ text:
What do you need for these tutorials?¶
Software¶
Prerequisites¶
These tutorials presume that you do already have some basic knowledge of programming.
In particular, it assumes knowledge of the Python programming language and standard library.
If you are somewhat unfamiliar with Python, you can follow the links in the Python notebook to catch up.
Computational Resources¶
The examples throughout these tutorials, and in the assignments are not computationally heavy.
You should be able to run all these materials on any computer you have access to, assuming it will run the aforementioned tools.
Installing Python¶
If you are running code locally, we recommend you install a new version of Python with Anaconda, as described below
If you are in the official course, you can use datahub for everything you need
If you are on Mac, you have a native installation of python. This native installation of Python may be older, will not include the extra packages that you will need for this class, and is best left untouched.
Downloading Anaconda will install a separate, independent install of Python, leaving your native install untouched.
Windows does not require Python natively and so it is not typically pre-installed.
Tools¶
The following are a series of tools that you will need for this class
Anaconda itself is a distribution, meaning that is a version of Python with a collection of packages that are curated and maintained together.
Using a pre-built distribution is useful, as it comes with the packages that you need for data science.
Anaconda also comes with conda
, which is a package manager, allowing you to download, install, and manage other packages.
The anaconda distribution includes all packages that are needed for these tutorials.
Note that you do not need to download Jupyter separately, as it comes packaged with the Anaconda distribution.
Checking Your Python Version¶
You can check which installation of Python you are using, and which version it is.
Once you have installed anaconda, you should see you are using Python in an anaconda folder.
The version number that is printed should also be 3.6 or greater.
# Check the installed version of Python
# Note: these are command-line functions that may not work on windows
!which python
!python --version
/opt/anaconda3/bin/python
Python 3.7.4
Git & GitHub are not the same thing, though, in practice, they are commonly used together, whereby git is used as a tool to version control code and manage multiple copies stored across your computer, as well as on remote repositories that are stored on Github.
Note that while GitHub is a private company, git is an open-source tool, and can be used independent of GitHub.
# Check that you have git installed (which version doesn't really matter)
!git --version
git version 2.20.1 (Apple Git-117)
You don’t need to use SourceTree (or any other GUI) if you know, or want to learn to use git from the command line.
Environments¶
You do not need to use environments, however you may find it useful if you want or need to maintain multiple different versions of Python.
If you want to use an environment, and already have conda, you can run this command from command line:
$ conda create --name *envname* python=3.7 anaconda
^ Replace ‘envname’ with a name to call this environment.
This will install a new environment, with Python 3.7 and the anaconda distribution.
You will then need to activate this environment (everytime) you want to use it.
To activate your environment:
$ conda activate *envname*
To deactivate your environment:
$ conda deactivate