CS109B: Advanced Topics in Data Science
- Pavlos Protopapas (Computer Science)
- Mark Glickman (Statistics)
Katy McKeough email@example.com
Kevin Wu firstname.lastname@example.org
Eric Wu email@example.com
David Wihl firstname.lastname@example.org
Rashmi Banthia email@example.com
Zona Kostic firstname.lastname@example.org
Eleni Kaxiras email@example.com
Nicholas Ruta firstname.lastname@example.org
Sol Girouard email@example.com
Samuel Plank firstname.lastname@example.org
Welcome to Data Science 2 (DS2)! The course is listed as CS109b, STAT121b, and AC209b, and offered through the Harvard University Extension School as distance education course CSCI E-109b.
The requirements for these four labelings of the course are the same, except that for students registered for AC209b there may be additional work.
What is this class about?
Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, basic Bayesian methods, nonlinear statistical models, unsupervised learning, and topic models. The final module will consist of multiple deep learning subjects such as CNNs, RNNs and Autoencoders. The major programming languages used will be R and Python.
This course can only be taken after successful completion of CS 109a, AC 209a, Stat 121a, or CSCI E-109a. Students who have previously taken CS 109, AC 209, Stat 121, or CSCI E-109 cannot take this class for credit.
ISLR: An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer: New York, 2013)
DL: Deep Learning by Goodfellow, Bengio and Courville.
Free electronic versions are available (ISLR, DL or hardcopy through Amazon ([ISLR] (https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/ref=sr_1_1?ie=UTF8), DL).