# Introduction

Welcome to the hands on materials for Data Science in Practice.

This notebook will guide through getting the tools you will need for working with these tutorials and assignments.

## Alerts

Throughout these tutorials, you will see colored 'alert' text:

<div class="alert alert-success">
Green alerts provide key information and definitions.
</div>

<div class="alert alert-info">
Blue alerts provide links out to further 
<a href=https://google.com class=alert-link>resources</a>. 
</div>

## What do you need for these tutorials?

### Software

- Working install of Python (>= 3.6), with the anaconda distribution
    - If you are in the official class, [datahub](http://datahub.ucsd.edu) satisfies this requirement
- Jupyter Notebooks
    - Also satisfied by [datahub](http://datahub.ucsd.edu)
- git and a GitHub account

### Prerequisites

These tutorials presume that you do already have some basic knowledge of programming. 

In particular, it assumes knowledge of the Python programming language and standard library. 

If you are somewhat unfamiliar with Python, you can follow the links in the Python notebook to catch up.

### Computational Resources

The examples throughout these tutorials, and in the assignments are not computationally heavy. 

You should be able to run all these materials on any computer you have access to, assuming it will run the aforementioned tools. 

### Installing Python

- If you are running code locally, we recommend you install a new version of Python with Anaconda, as described below
    - If you are in the official course, you can use [datahub](http://datahub.ucsd.edu) for everything you need
- If you are on Mac, you have a native installation of python. This native installation of Python may be older, will not include the extra packages that you will need for this class, and is best left untouched. 
    - Downloading Anaconda will install a separate, independent install of Python, leaving your native install untouched. 
- Windows does not require Python natively and so it is not typically pre-installed.

## Tools

The following are a series of tools that you will need for this class

<br>
<br>
<img src="https://raw.githubusercontent.com/COGS108/Tutorials/master/img/anaconda.png" width="350px">
<br>
<br>

<div class="alert alert-success">
Anaconda is an open-source distribution of Python, designed for scientific computing, data science and machine learning. 
</div>

<div class="alert alert-info">
The anaconda website is 
<a href="https://www.anaconda.com" class="alert-link">here</a>,
with the download page
<a href="https://www.anaconda.com" class="alert-link">here</a>.
</div>

Anaconda itself is a distribution, meaning that is a version of Python with a collection of packages that are curated and maintained together. 

Using a pre-built distribution is useful, as it comes with the packages that you need for data science.

Anaconda also comes with `conda`, which is a package manager, allowing you to download, install, and manage other packages. 

The anaconda distribution includes all packages that are needed for these tutorials.

<br>
<br>
<img src="https://raw.githubusercontent.com/COGS108/Tutorials/master/img/jupyter.png" width="250px">
<br>
<br>

<div class="alert alert-success">
Jupyter notebooks are a way to intermix code, outputs and plain text. 
They run in a web browser, and connect to a kernel to be able to execute code. 
</div>

<div class="alert alert-info">
The official Jupyter website is available 
<a href="http://jupyter.org" class="alert-link">here</a>.
</div>

Note that you do not need to download Jupyter separately, as it comes packaged with the Anaconda distribution.

#### Checking Your Python Version

You can check which installation of Python you are using, and which version it is.

Once you have installed anaconda, you should see you are using Python in an anaconda folder. 

The version number that is printed should also be 3.6 or greater. 

In [1]:
# Check the installed version of Python
#   Note: these are command-line functions that may not work on windows
!which python
!python --version

/opt/anaconda3/bin/python
Python 3.7.4


<br>
<br>
<img src="https://raw.githubusercontent.com/COGS108/Tutorials/master/img/git.png" width="300px">
<br>
<br>

<div class="alert alert-success">
Git is a tool, a software package, for version control. 
</div>

<div class="alert alert-info">
Install 
<a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git" class="alert-link">git</a>,
if you don't already have it.
</div>

<br>
<br>
<img src="https://raw.githubusercontent.com/COGS108/Tutorials/master/img/github.png" width="300px">
<br>
<br>

<div class="alert alert-success">
Github is an online hosting service that can be used with git, and offers online tools to use git. 
</div>

<div class="alert alert-info">
Create an account on 
<a href="https://github.com/" class="alert-link">Github</a>.
</div>

Git & GitHub are not the same thing, though, in practice, they are commonly used together, whereby git is used as a tool to version control code and manage multiple copies stored across your computer, as well as on remote repositories that are stored on Github.

Note that while GitHub is a private company, git is an open-source tool, and can be used independent of GitHub.

In [2]:
# Check that you have git installed (which version doesn't really matter)
!git --version

git version 2.20.1 (Apple Git-117)


<br>
<br>
<img src="https://raw.githubusercontent.com/COGS108/Tutorials/master/img/sourcetree.png" width="500px">
<br>
<br>

<div class="alert alert-success">
Source Tree is a free graphical user interface (GUI) for managing repositories with git & Github. 
</div>

<div class="alert alert-info">
Source Tree is available 
<a href="https://www.sourcetreeapp.com" class="alert-link">here</a>.
You will need an account on 
<a href="https://www.atlassian.com" class="alert-link">Atlassian</a>,
who make Source Tree, but this is free.
</div>

You don't need to use SourceTree (or any other GUI) if you know, or want to learn to use git from the command line.

## Environments

<div class="alert alert-success">
Environments are isolated, independent installations of a programming language and groups of packages, that don't interfere with each other. 
</div>

<div class="alert alert-info">
Anaconda has detailed instructions on using environments available 
<a href="https://conda.io/docs/using/envs.html" class="alert-link">here</a>.
</div>

You do not need to use environments, however you may find it useful if you want or need to maintain multiple different versions of Python. 

If you want to use an environment, and already have conda, you can run this command from command line: <br>

``$ conda create --name *envname* python=3.7 anaconda`` <br>

^ Replace '*envname*' with a name to call this environment.<br>

This will install a new environment, with Python 3.7 and the anaconda distribution.

You will then need to activate this environment (everytime) you want to use it. 

To activate your environment: <br>
``$ conda activate *envname*``

To deactivate your environment: <br>
``$ conda deactivate``