CS109B: Advanced Topics in Data Science

Spring 2018


Teaching Assistants

Welcome to Data Science 2 (DS2)! The course is listed as CS109b, STAT121b, and AC209b, and offered through the Harvard University Extension School as distance education course CSCI E-109b.

The requirements for these four labelings of the course are the same, except that for students registered for AC209b there may be additional work.

What is this class about?

Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, basic Bayesian methods, nonlinear statistical models, unsupervised learning, and topic models. The final module will consist of multiple deep learning subjects such as CNNs, RNNs and Autoencoders. The major programming languages used will be R and Python.


This course can only be taken after successful completion of CS 109a, AC 209a, Stat 121a, or CSCI E-109a. Students who have previously taken CS 109, AC 209, Stat 121, or CSCI E-109 cannot take this class for credit.

Recommended Textbooks

ISLR: An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer: New York, 2013)

DL: Deep Learning by Goodfellow, Bengio and Courville.

Free electronic versions are available (ISLR, DL or hardcopy through Amazon ([ISLR] (https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=sr_1_1?ie=UTF8), DL).