CS109a: Introduction to Data Science



Fall 2021

Pavlos Protopapas and Natesh Pillai

Welcome to CS109a/STAT121a/AC209a, also offered by the DCE as CSCI E-109a, Introduction to Data Science. This course is the first half of a one‐year course to data science. We will focus on the analysis of data to perform predictions using statistical and machine learning methods. Topics include data scraping, data management, data visualization, regression and classification methods, and deep neural networks. You will get ample practice through weekly homework assignments. The class material integrates the five key facets of an investigation using data:

1. data collection ‐ data wrangling, cleaning, and sampling to get a suitable data set
2. data management ‐ accessing data quickly and reliably
3. exploratory data analysis – generating hypotheses and building intuition
4. prediction or statistical learning
5. communication – summarizing results through visualization, stories, and interpretable summaries

Only one of CS 109a, AC 209a, or Stat 121a can be taken for credit. Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS 109a, AC 209a, or Stat 121a for credit.

Important Dates:
Tuesday 9/8 - HW1 released
Wednesday 9/8 - HW0 due at 11:59pm EST

Helpline: cs109a2021@gmail.com

Lectures: Mon & Wed 9:45-11am - SEC Room 1.321
Lab: Fri 9:45-11am - l114 Western Ave., Allston Room 2.111
Advanced Sections: Wed 12:45-2pm [starting 9/29] - SEC Room LL2.229 (See course schedule for dates)
Office Hours: Current Schedule Here With More To Come
Course material can be viewed in the public GitHub repository.

Previous Material