Appendix: Python Packages

The following is general overview of packages available in Python that may be useful for data science.
For a much broader / fuller list of the Python ecosystem, check out the Awesome Python list.

Data-Science Modules

These are all external (non-standard library) packages. Many of them are available in the Ananconda distribution.

Core Packages

  • scipy - mathematics, science, and engineering.

  • numpy - numerical computing with arrays & array operations.

  • pandas - data structures and data analysis.

  • scikit-learn - machine learning and data analysis.

Text Mining

  • nltk - natural language processing.

  • gensim - topic modelling.

Mathematics & Statistics

Web Scraping

Plotting / Vizualization Libraries

  • matplotlib - 2D plotting library.

  • seaborn - visualization (based on matplotlib).

  • bokeh - interactive visualizations.

Graph Theory / Networks

Deep Learning

  • theano - mathematical operations on multi-dimensional arrays.

  • tensorflow - numerical computation using data flow graphs.

  • keras - a high-level neural network library.

Useful parts of the standard library

The full list of packages in the standard library is available here.

Basic Utilities

  • os - miscellaneous operating system operations.

  • sys - system operations.

  • datetime - manipulating dates & times.

  • glob - searching path names.

Useful Functions

  • math - mathematical functions.

  • random - (pseudo) random number generators.

  • re - regular expressions.

File Formats

  • json - support for working with JSON files.

  • csv - support for working with CSV files.

Data Objects

  • collections - container data types.

  • pickle - serializing & de-serializing (saving and loading complex objects).