Python is an open-source, high-level , general purpose, interpreted, programming language, one of the most popular for data science applications.
The official Python website.

Why Python

  • As a general purpose language, Python supports a large range of tasks.

    • Or put another way: ‘Python isn’t the best at anything, but it’s second best at everything’

    • This is useful. A data science project may include everything from scraping data from the web, analyzing a mixture or text and numerical data, computing features, training a model, creating high-quality graphs, and then hosting a website with your results.

  • Python is explicitly and by design, user-friendly.

  • Python also has a massive user community, who contribute to a large number of high-quality, well maintained open-source tools.

    • The best language for your project is one which has the things you need.

  • In part for the reasons listed above, Python is heavily used in industry

The Python programming language is developed and maintained by the Python Software Foundation.

Python Versions

This class uses Python3, the currently developed version of Python, and more specifically Python version 3.6 or above.

Python2 has reached “End of Life” meaning it is no longer supported or maintained by the Python Organization.

Python Resources

These materials presume prior knowledge of the Python programming language.

If you are note yet familiar, here are some entry level materials for learning Python:

  • Codecademy is good for a beginner’s introduction to the language.

  • The Official Beginners Guide is supported by the Python organization.

  • Whirlwind Tour of Python is a free collection of Jupyter notebooks that takes you through Python.

    • This book is especially good (and specifically designed for) if you have some experience with programming in some other language, and want to quickly run through the specifics of Python.

A much broader list of resources and guides for learning Python is available here.

Getting Un-Stuck

At some point, you will get stuck. It happens. The internet is your friend.

If you get an error, or aren’t sure how to proceed, use {your favourite search engine} with specific search terms relating to what you are trying to do. Sometimes this just means searching the error that you got.

Your are likely to find responses on StackOverflow - which is basically a forum for programming questions, and a good place to find answers.

Standard Library

The Standard Library refers to everything in Python that is part of standard version and install of Python.
The Python Standard Library comes with a lot of basic functionality.

Part of what makes Python a powerful language is the standard library itself, which is a rich set of tools for programming. However, the standard library itself does not include data science tools, and a lot of the power of Python stems for a rich ecosystem of packages that can be added and used with Python.


Packages are collections of code. Packages from outside the standard library can be installed and added to Python.
For managing and installing packages, Anaconda comes with the conda package manager.

Scientific Python

When we say that Python is good for data science, and scientific computing, what we really mean is that there is a rich ecosystem of available open-source external packages, that greatly expand the capacities of the language beyond the standard library.

This set of packages, which we will introduce as we go through these materials, is sometimes referred to as ‘Scientific Python’, or the ‘Scipy’ ecosystem.

For the purposes of these materials, the Anaconda distribution that we are using contains all the packages you need.