Draft Syllabus Subject to Change

Advanced Topics in Data Science (Spring 2021)

CS 109b, AC 209b, Stat 121b, or CSCI E-109b


Pavlos Protopapas (SEAS), Mark Glickman (Statistics), & Chris Tanner (SEAS)

Lectures: Mon, Wed, & Fri 9‐10:15am
Sections: Fri 10:30am (starting 3/5)
Advanced Sections: Wed 12pm (starting 3/10)
Office Hours: TBD

Prerequisites: CS 109a, AC 209a, Stat 121a, or CSCI E-109a or the equivalent.

Course Description

Advanced Topics in Data Science (CS109b) is the second half of a one-year introduction to data science. Building upon the material in Introduction to Data Science, the course introduces advanced methods for data wrangling, data visualization, statistical modeling, and prediction. Topics include big data, multiple deep learning architectures such as CNNs, RNNs, language models, transformers, autoencoders, and generative models as well as basic Bayesian methods, nonlinear statistical models, and unsupervised learning.

The programming language will be Python.

Tentative Course Topics

Course Objectives

Upon successful completion of this course, you should feel comfortable with the material mentioned above, and you will have gained experience working with others on real-world problems. The content knowledge, the project, and teamwork will prepare you for the professional world or further studies.

Course Components

Lectures, sections, and advanced sections will be live-streamed and can be accessed through the Zoom section on Canvas. Video recordings of the live stream will be made available within 24 hours after the event, and will be accessible from the Lecture Video section on Canvas.


The class meets, virtually, three days a week for lectures (M, W, F). Mondays and Wednesdays will be mostly lecture content with some hands-on coding, whereas Fridays will be the inverse (mostly hands-on coding). Attending and participating in lectures is a crucial component of learning the material presented in this course.

What to expect Class Structure

A lecture will have the following pedagogy layout which will be repeated:


At the end of each lecture, there will be a short, graded quiz that will cover the pre-class and in-class material; there will be no AC209b content in the quizzes. The quizzes will be available until the next lecture.

25% of the quizzes will be dropped from your grade.


Lectures will include one or more coding exercises focused on the newly introduced material; there will be no AC209b content in the exercises. The exercises are short enough to be completed during the time allotted in lecture but they will remain available until the beginning of the following lecture to accomodate those who cannot attend in real time.

25% of the exercises will be dropped from your grade. Exercises will only be factored into your grade if the would improve your final grade.


Lectures are supplemented by sections led by teaching fellows. There are two types of sections:

Standard Sections :

This will be a mix of review of material and practice problems similar to the HW.

Advanced Sections

The course will include advanced sections for 209b students and will cover a different topic per week.  These are 75 min lectures and they will cover advanced topics like the mathematical underpinnings of the methods seen in lecture and lab and extensions of those methods.  The material covered in the advanced sections is required for all AC209b students. Tentative topics are:

Note: Sections are not held every week. Consult the course calendar for exact dates.


There are no exams in this course.


During the final four (4) weeks of the course, students will be divided in break-out thematic sections where they will study topics. These topics are tentative at the moment but may include medicine, law, astronomy, e-commerce, and government. Each section will include lectures by Harvard faculty, experts on the field, followed by project work also led by that faculty.  You will get to present your projects in the SEAS Design Fair at the end of the semester.

Homework Assignments

There will be eight graded homework assignments. Some of them will be due one week after being assigned and some will be due two weeks after being assigned. For six assignments, you have the option to work and submit in pairs, the two remaining are to be completed individually.

Course Resources

Online Materials

All course materials, including lecture notes, lab notes, and section notes will be published on the course GitHub repo as well as the public site's Materials section.
Note: Lecture content for weeks 1-3 is only available to registered students through the Materals section.
Assignments will only be posted on Canvas.

Working Environment

You will be working in Jupyter Notebooks which you can run in your own machine or in the SEAS JupyterHub.

Recommended Textbooks

Getting Help

For questions about homework, course content, package installation, the process is:

Course Policies and Expectations

Grading for CS109b, STAT121b, and CS209b (tentative):

Your final score for the course will be computed using the following weights:

Assignment Final Grade Weight
Paired Homework (6) 47%
Individual Homework (2) 21%
Quizzes 8%
Ed Exercises 4%
Project 20%
Total 100%

Note: Regular homework (for everyone) counts as 5 points. 209b extra homework counts as 1 point.

Collaboration Policy

We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board. If you work with a partner on an assignment make sure both parties solve all the problems. Do not divide and conquer. You are expected to be intellectually honest and give credit where credit is due. In particular:

Late or Wrongly Submitted Assignments

There are no late days in homework submission. We will accept late submissions only for medical reasons and if accompanied by a doctor's note.

To submit after Canvas has closed or to ask for an extension, send an email to the Helpline with subject line "Submit HW1: Reason=the flu"  replacing 'HW1' with the name of the current assignment and "the flu" with your reason. You need to attach the note from your medical provider otherwise we will not accept the request.

If you forgot to join a Group with your peer and are asking for the same grade we will accept this with no penalty up to HW3. For homeworks beyond that we feel that you should be familiar with the process of joining groups. After that there will be a penalty of -1 point for both members of the group provided the submission was on time.

Re-grade Requests

Our graders and instructors make every effort in grading accurately and in giving you a lot of feedback.

If you discover that your answer to a homework problem was correct but it was marked as incorrect, send an email to the Helpline with a description of the error. Please do not submit regrade requests based on what you perceive is overly harsh grading. The points we take off are based on a grading rubric that is being applied uniformly to all assignments.

If you decide to send a regrade request, send an email to the Helpline with subject line "Regrade HW1: Grader=johnsmith" replacing 'HW1' with the current assignment and 'johnsmith' with the name of the grader within 48 hours of the grade release.

Auditing the Class

You are welcome to audit this course. To request access, send an email to with you HUID (required) and a statment of agreement to the terms below.

All auditors must agree to abide by the following rules:

Academic Integrity

Ethical behavior is an important trait of a Data Scientist, from ethically handling data to attribution of code and work of others. Thus, in CS109b we give a strong emphasis to Academic Honesty. As a student your best guidelines are to be reasonable and fair. We encourage teamwork for problem sets, but you should not split the homework and you should work on all the problems together. For more detailed expectations, please refer to the Collaborations section above.

You are responsible for understanding Harvard Extension School policies on academic integrity and how to use sources responsibly. Stated most broadly, academic integrity means that all course work submitted, whether a draft or a final version of a paper, project, take-home exam, online exam, computer program, oral presentation, or lab report, must be your own words and ideas, or the sources must be clearly acknowledged. The potential outcomes for violations of academic integrity are serious and ordinarily include all of the following: required withdrawal (RQ), which means a failing grade in the course (with no refund), the suspension of registration privileges, and a notation on your transcript.

Using sources responsibly is an essential part of your Harvard education. We provide additional information about our expectations regarding academic integrity on our website. We invite you to review that information and to check your understanding of academic citation rules by completing two free online 15-minute tutorials that are also available on our site. (The tutorials are anonymous open-learning tools.)

Accommodations for students with disabilities

Harvard students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with the professor by the end of the second week of the term, (fill in specific date). Failure to do so may result in the Course Head's inability to respond in a timely manner. All discussions will remain confidential, although Faculty are invited to contact AEO to discuss appropriate implementation.

Harvard Extension School is committed to providing an inclusive, accessible academic community for students with disabilities and chronic health conditions. The Accessibility Services Office (ASO) offers accommodations and supports to students with documented disabilities. If you have a need for accommodations or adjustments in your course, please contact the Accessibility Services Office by email at or by phone at 617-998-9640.

Diversity and Inclusion Statement

Data Science and Computer Science have historically been representative of only a small sliver of the population. This is despite the contributions of a diverse group of early pioneers - see Ada Lovelace, Dorothy Vaughan, and Grace Hopper for just a few examples.

As educators, we aim to build a diverse, inclusive, and representative community offering opportunities in data science to those who have been historically marginalized. We will encourage learning that advances ethical data science, exposes bias in the way data science is used, and advances research into fair and responsible data science.

We need your help to create a learning environment that supports a diversity of thoughts, perspectives, and experiences, and honors your identities (including but not limited to race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:

Our course will discuss diversity, inclusion, and ethics in data science. Please contact us (in person or electronically) or submit anonymous feedback if you have any suggestions for how we can improve.