CS109B: Advanced Topics in Data Science
Spring 2018
Instructors
 Pavlos Protopapas (Computer Science)
 Mark Glickman (Statistics)
Teaching Assistants

Katy McKeough kathrynmckeough@g.harvard.edu

Kevin Wu kewu93@gmail.com

Eric Wu eric_wu@g.harvard.edu

David Wihl davidwihl@g.harvard.edu

Rashmi Banthia rjain29@gmail.com

Zona Kostic zonakostic@g.harvard.edu

Eleni Kaxiras eleni@seas.harvard.edu

Nicholas Ruta nruta@g.harvard.edu

Sol Girouard solgirouard@g.harvard.edu

Samuel Plank samuelplank@college.harvard.edu
Welcome to Data Science 2 (DS2)! The course is listed as CS109b, STAT121b, and AC209b, and offered through the Harvard University Extension School as distance education course CSCI E109b.
The requirements for these four labelings of the course are the same, except that for students registered for AC209b there may be additional work.
What is this class about?
Data Science 2 is the second half of a oneyear introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, basic Bayesian methods, nonlinear statistical models, unsupervised learning, and topic models. The final module will consist of multiple deep learning subjects such as CNNs, RNNs and Autoencoders. The major programming languages used will be R and Python.
Prerequisites
This course can only be taken after successful completion of CS 109a, AC 209a, Stat 121a, or CSCI E109a. Students who have previously taken CS 109, AC 209, Stat 121, or CSCI E109 cannot take this class for credit.
Recommended Textbooks
ISLR: An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer: New York, 2013)
DL: Deep Learning by Goodfellow, Bengio and Courville.
Free electronic versions are available (ISLR, DL or hardcopy through Amazon ([ISLR] (https://www.amazon.com/IntroductionStatisticalLearningApplicationsStatistics/dp/1461471370/ref=sr_1_1?ie=UTF8), DL).