Draft Syllabus Subject to Change

Advanced Topics in Data Science (Spring 2022)

CS 109b, AC 209b, Stat 121b, or CSCI E-109b


Pavlos Protopapas (SEAS) & Mark Glickman (Statistics)

Lectures: Mon & Wed 9:45‐11am SEC 1.321
Labs: Fri 9:45-11am SEC 1.321
Advanced Sections: Wed 2:15-3:30pm SEC 1.321 (starting 2/2; see schedule for specific dates)
Office Hours: See Ed Post

Prerequisites: CS 109a, AC 209a, Stat 121a, or CSCI E-109a or the equivalent.

Course Description

Advanced Topics in Data Science (CS109b) is the second half of a one-year introduction to data science. Building upon the material in Introduction to Data Science, the course introduces advanced methods for data wrangling, data visualization, statistical modeling, and prediction. Topics include big data, multiple deep learning architectures such as CNNs, RNNs, language models, transformers, autoencoders, and generative models as well as Bayesian modeling, sampling methods, and unsupervised learning.

The programming language will be Python.

Tentative Course Topics

Course Objectives

Upon successful completion of this course, you should feel comfortable with the material mentioned above, and you will have gained experience working with others on real-world problems. The content knowledge, the project, and teamwork will prepare you for the professional world or further studies.

Course Components

Lectures, labs, and advanced sections will be live-streamed for Extension School students and can be accessed through the Zoom section on Canvas. Video recordings of the live stream will be made available to all students within 24 hours after the event, and will be accessible from the Lecture Video section on Canvas.


The class meets for lectures twice a week (M & W). Attending and participating in lectures is a crucial component of learning the material presented in this course. Students may be asked to complete short readings before certain lectures. Some lectures will also include real-time coding exercises which we will complete as a class.


Lab will be held every Friday. Labs will present guided, hands-on coding challenges to prepare students for successfully completing the homework assignments.

Advanced Sections

The course will include advanced sections for 209b students and will cover a different topic per week.  These 75 min sessions will cover advanced topics like the mathematical underpinnings of the methods seen in the main course lectures and lab as well as extensions of those methods.  The material covered in the advanced sections is required for all AC209b students. Tentative topics are:

Note: Advanced Section are not held every week. Consult the course calendar for exact dates.


There will be a midterm exam on Friday, March 25th from 9:45-11am (regular lab time). The exam will likely consist of multiple choice questions with a take-home coding component. More information to follow.


Beginning the last week of classes (4/25), students will join groups of 3-4 and be divided into break-out, thematic sections to study an open problem in one of various domains. The domains are tentative at the moment but may include medicine, law, astronomy, e-commerce, government, and areas in the humanities. Each section will include lectures by Harvard faculty who are experts in the field. Project work will continue on through reading period and conclude with final submissions on 5/6. The final submission will consist of a written report, a Jupyter notebook with all relevant code, and a 6-minute, pre-recorded presentation video.

Homework Assignments

There will be 7 graded homework assignments. Some of them will be due one week after being assigned and some will be due two weeks after being assigned. For 5 assignments, you have the option to work and submit in pairs, the 2 remaining are to be completed individually.

Standard assignments are graded out of 5 points.

AC209b students will have additional homework content for most assignments worth 1 point.

Course Resources

Online Materials

All course materials, including lecture notes, lab notes, and section notes will be published on Ed, the course GitHub repo, and the public site's 'Materials' section.

Note: Lecture content for lectures 1-7 will only be accessible to registered students.

Assignments will only be posted on Canvas.

Working Environment

You will be working in Jupyter Notebooks which you can run in your own machine or in the SEAS JupyterHub.

Recommended Textbooks

Articles & Excerpts

Getting Help

For questions about homework, course content, package installation, the process is:

Course Policies and Expectations

Grading for CS109b, STAT121b, and CS209b:

Your final score for the course will be computed using the following weights:

Assignment Final Grade Weight
Paired Homework (5) 45%
Individual Homework (2) 23%
Midterm 12%
Project 20%
Total 100%

Note: Regular homework (for everyone) counts as 5 points. 209b extra homework counts as 1 point.

Collaboration Policy

We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board. If you work with a partner on an assignment make sure both parties solve all the problems. Do not divide and conquer. You are expected to be intellectually honest and give credit where credit is due. In particular:

Late or Incorrectly Submitted Assignments

Each student is allowed up to 3 late days over the semester with at most 1 day applied to any single homework. Outside of these allotted late days, late homework will not be accepted unless there is a medical (if accompanied by a doctor's note) or other official University-excused reasons. There is no need to ask before using one of your late days.

If you forgot to join a Group with your peer and are asking for the same grade we will accept this with no penalty up to HW3. For homeworks beyond that we feel that you should be familiar with the process of joining groups. After that there will be a penalty of -1 point for both members of the group provided the submission was on time.

Grading Guidelines

Homework will be graded based on:

  1. How correct your code is (the Notebook cells should run, we are not troubleshooting code)

  2. How you have interpreted the results — we want text not just code. It should be a report.

  3. How well you present the results.

The scale is 0 to 5 for each assignment and 0 to 1 for the additional 209 assignments.

Re-grade Requests

Our graders and instructors make every effort in grading accurately and in giving you a lot of feedback.

If you discover that your answer to a homework problem was correct but it was marked as incorrect, send an email to the Helpline with a description of the error. Please do not submit regrade requests based on what you perceive is overly harsh grading. The points we take off are based on a grading rubric that is being applied uniformly to all assignments.

If you decide to send a regrade request, send an email to the Helpline with subject line "Regrade HW1: Grader=johnsmith" replacing 'HW1' with the current assignment and 'johnsmith' with the name of the grader within 48 hours of the grade release.

Auditing the Class

You are welcome to audit this course. To request access, send an email to with you HUID (required) and a statement of agreement to the terms below.

All auditors must agree to abide by the following rules:

Academic Integrity

Ethical behavior is an important trait of a Data Scientist, from ethically handling data to attribution of code and work of others. Thus, in CS109b we give a strong emphasis to Academic Honesty. As a student your best guidelines are to be reasonable and fair. We encourage teamwork for problem sets, but you should not split the homework and you should work on all the problems together. For more detailed expectations, please refer to the Collaborations section above.

You are responsible for understanding Harvard Extension School policies on academic integrity and how to use sources responsibly. Stated most broadly, academic integrity means that all course work submitted, whether a draft or a final version of a paper, project, take-home exam, online exam, computer program, oral presentation, or lab report, must be your own words and ideas, or the sources must be clearly acknowledged. The potential outcomes for violations of academic integrity are serious and ordinarily include all of the following: required withdrawal (RQ), which means a failing grade in the course (with no refund), the suspension of registration privileges, and a notation on your transcript.

Using sources responsibly is an essential part of your Harvard education. We provide additional information about our expectations regarding academic integrity on our website. We invite you to review that information and to check your understanding of academic citation rules by completing two free online 15-minute tutorials that are also available on our site. (The tutorials are anonymous open-learning tools.)

Accommodations for students with disabilities

Harvard students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with the professor by the end of the second week of the term, (fill in specific date). Failure to do so may result in the Course Head's inability to respond in a timely manner. All discussions will remain confidential, although Faculty are invited to contact AEO to discuss appropriate implementation.

Harvard Extension School is committed to providing an inclusive, accessible academic community for students with disabilities and chronic health conditions. The Accessibility Services Office (ASO) offers accommodations and supports to students with documented disabilities. If you have a need for accommodations or adjustments in your course, please contact the Accessibility Services Office by email at or by phone at 617-998-9640.

Diversity and Inclusion Statement

Data Science and Computer Science have historically been representative of only a small sliver of the population. This is despite the contributions of a diverse group of early pioneers - see Ada Lovelace, Dorothy Vaughan, and Grace Hopper for just a few examples.

As educators, we aim to build a diverse, inclusive, and representative community offering opportunities in data science to those who have been historically marginalized. We will encourage learning that advances ethical data science, exposes bias in the way data science is used, and advances research into fair and responsible data science.

We need your help to create a learning environment that supports a diversity of thoughts, perspectives, and experiences, and honors your identities (including but not limited to race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:

Our course will discuss diversity, inclusion, and ethics in data science. Please contact us (in person or electronically) or submit anonymous feedback if you have any suggestions for how we can improve.