Data Science in Practice

Data Science in Practice is an open set of materials for learning introductory data science.

This website is a public version of the Data Science in Practice course, taught as COGS 108 at UC San Diego.

If you are in the COGS108 class at UC San Diego, this website is not the same as the materials and coursework for the class.

Overview

The goal of Data Science in Practice is to introduce the practical elements of doing data science.

Data science is an emerging and multidisciplinary field, organized around the practice of analyzing data, and all the questions, practices and problems that entails.

These materials focus on the practical elements of finding, analyzing, interpreting and contextualizing data analysis, in order to practice answering questions with data.

Requirements

These materials uses the Python programming language, and presume knowledge of standard library Python.

The tutorials introduce how to get Python installed in which dependencies are needed.

Content

Available materials include:

  • Tutorials which introduce key topics for doing data science

    • These can be used to explore and learn about key topics

  • Assignments which are problem sets that can be worked through

    • These can be used to practice key skills and ideas with code

  • Projects which describes how to pursue independent analysis projects

    • This can be used as a guide for how to continue with real data science projects

All the materials are listed in the table of contents in the left sidebar.

Note that these materials are not created as fully detailed descriptions or formal descriptions of the topics they introduce.

Rather, they seek to introduce key topics, demonstrate them in code, and allow for interaction, exploration and practice.

Put another way, these materials are designed to be more of a map than encyclopedia.

For further information on topics we introduce, these materials link to external resources.

How to Use These Materials

These materials are created as Jupyter Notebooks, and are intended to be executed and explored in a hands-on manner.

There is a download link at the top left of the page, that can be used to download each page as a notebook. This allows you to use the notebook locally, executing code, and answering questions.

Issue Tracking

If you have any find any bugs or issues, or have any suggestions for these materials, please open an issue.

Source Materials

This set of materials is an openly available version of tutorials and coursework developed for and used in a university undergraduate course, COGS 108, which is taught at UC San Diego.

These materials may still contain some references to the university course or to grading, which can be ignored.

You can find more information about the university course in the overview repository.

The materials for this open version of the course are managed through this Github organization.

The source repository for this website is available here.

Reference

This project is described in the following paper:

Donoghue T, Voytek B, & Ellis S (2022). Course Materials for Data Science in 
Practice. Journal of Open Source Education, 5(51), 121. DOI: 10.21105/jose.00121

Direct Link: https://doi.org/10.21105/jose.00121

License

The materials on this website are openly available under a CC-BY 4.0 license.

Acknowledgments

The original university course these materials are adapted from was originally created by Bradley Voytek, and is currently primarily taught by Shannon Ellis. This website and many of the materials were developed by Tom Donoghue, with additional contributions from the course staff.