Syllabus
Advanced Topics in Data Science (Spring 2020)
CS 109b, AC 209b, Stat 121b, or CSCI E-109b
Instructors
Pavlos Protopapas (SEAS), Mark Glickman (Statistics), & Chris Tanner (SEAS)
Lab Leaders: Chris Tanner & Eleni Kaxiras
Lectures: Mon and Wed 1:30‐2:45pm in NW B103
Labs: Monday 4:30-5:45pm & 6:00-7:15pm in Pierce Hall 301 (identical material at both times)
Advanced Sections: Wed 4:30-5:45pm in Maxwell-Dworkin G115
Office Hours: See weekly calendar for times and locations
Prerequisites: CS 109a, AC 209a, Stat 121a, or CSCI E-109a or the equivalent.
- Course description
- Tentative Course Topics
- Course Objectives
- Course Components
- Course Resources
- Course Policies and Expectations
Course Description
Advanced Topics in Data Science (CS109b) is the second half of a one-year introduction to data science. Building upon the material in Introduction to Data Science, the course introduces advanced methods for data wrangling, data visualization, statistical modeling, and prediction. Topics include big data, multiple deep learning architectures such as CNNs, RNNs, autoencoders, and generative models as well as basic Bayesian methods, nonlinear statistical models, and unsupervised learning.
The programming language will be Python.
Tentative Course Topics
- Smoothing and Additive Models
- Unsupervised Learning, Clustering
- Bayesian Modeling
- Convolutional Neural Networks
- Autoencoders
- Recurrent Neural Networks
- NLP / Text Analysis
- Variational AutoEncoders & Generative Models
- Generative Adversarial Networks
- (Deep) Reinforcement Learning
Course Objectives
Upon successful completion of this course, you should feel comfortable with the material mentioned above, and you will have gained experience working with others on real-world problems. The content knowledge, the project, and teamwork will prepare you for the professional world or further studies.
Course Components
There will be live video feed available only to continuing education students for lectures, labs, and advanced sections. Recordings for all other students will be available within 24 hrs.
Video streams and recordings can be accessed from the Videos section on Canvas.
Lectures
The class meets twice a week for lectures. Attending lectures is a crucial component of learning the material presented in this course.
Labs
Lectures are supplemented by weekly lab sections that include additional discussion of the course material, hands-on programming exercises, and group activities.
In-Class Quizzes
At the end of each lecture, we will ask you to take a short, graded quiz on the material presented in class; there will be no AC209b content in the quizzes.
50% of the quizzes will be dropped from your grade.
DCE students' quizzes will not count toward their final grade.
Advanced Sections
The course will include advanced sections for 209b students and will cover a different topic per week. These are 75 min lectures and they will cover advanced topics like the mathematical underpinnings of the methods seen in lecture and lab and extensions of those methods. The material covered in the advanced sections is required for all AC209b students. Tentative topics are:
- ResNet, Dense-Net, res-Next and Inception and transfer learning
- Segmentation Techniques, YOLO, Unet and M-RCNN
- RNN, Echo State
- Variational Inference
- GANS. Cycle GANS, etc.
- RL
Exams
There are no exams in this course.
Projects
Non-DCE Students
During the final four (4) weeks of the course, students will be divided in break-out thematic sections where they will study topics. These topics are tentative at the moment but may include medicine, law, astronomy, e-commerce, and government. Each section will include lectures by Harvard faculty, experts on the field, followed by project work also led by that faculty. You will get to present your projects in the SEAS Design Fair at the end of the semester.
DCE Students
The goal of the project is to have a complete end-to-end data science process encompassing both semesters of subject material while working as a 3-4 person team. We will supply a small set of project choices within the thematic categories. Teams may propose a different project with sufficient notice and will be subject to approval by the course staff.
Homework Assignments
There will be eight graded homework assignments. Some of them will be due one week after being assigned and some will be due two weeks after being assigned. For six assignments, you have the option to work and submit in pairs, the two remaining are to be completed individually.
Course Resources
Online Materials
All course materials, including lecture notes, lab notes, and section notes will be published on the course GitHub repo as well as the public site's Materials section.
Note: Lecture content for weeks 1-3 is only available to registered students through the Materals section.
Assignments will only be posted on Canvas.
Working Environment
You will be working in Jupyter Notebooks which you can run in your own machine or in the SEAS JupyterHub cloud (see lab 1 for details).
Recommended Textbooks
- ISLR: An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer: New York, 2013)
- DL: Deep Learning by Goodfellow, Bengio and Courville. (The MIT Press: Cambridge, 2016)
Free electronic versions are available (ISLR, DL) or hard copy through Amazon (ISLR, DL).
Getting Help
For questions about homework, course content, package installation, the process is:
- try to troubleshoot yourself by reading the lecture, lab, and section notes, and looking up online resources.
- go to office hours this is the best way to get help.
- post on the class Ed forum; we want you and your peers to engage in helping each other. TFs also monitor Ed and will respond within 24 hours. Note that Ed questions are visible to everyone. If you are citing homework solution code you must post privately so that only the staff sees your message.
- watch for official announcements via Canvas so make sure you have your Canvas notifications turned on. Ed should always be your first resource for seeking answers to your content questions.
- send an email to the Helpline cs109b2020@gmail.com for administrative issues, regrade requests, and non-content specific questions.
- for personal matters that you do not feel comfortable sharing with the TFs, you may send an email to either or both of the instructors.
Course Policies and Expectations
Grading for CS109b, STAT121b, and CS209b (tentative):
Your final score for the course will be computed using the following weights:
Non-DCE Students
Assignment | Final Grade Weight |
---|---|
Paired Homework (6) | 47% |
Individual Homework (2) | 21% |
Quizzes | 8% |
Ed Exercises | 4% |
Project | 20% |
Total | 100% |
Note: Regular homework (for everyone) counts as 5 points. 209b extra homework counts as 1 point.
DCE Students
Assignment | Final Grade Weight |
---|---|
Paired Homework (6) | 51% |
Individual Homework (2) | 23% |
Ed Exercises | 4% |
Project | 22% |
Total | 100% |
Collaboration Policy
We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board. If you work with a partner on an assignment make sure both parties solve all the problems. Do not divide and conquer. You are expected to be intellectually honest and give credit where credit is due. In particular:
- if you work with a fellow student but decide to submit different papers, include the name of each other in the designated area of the submission paper.
- if you work with a fellow student and want to submit the same paper you need to form a group prior to the submission. Details in the assignment. Not all assignments will permit group submissions.
- you need to write your solutions entirely on your own or with your collaborator
- you are welcome to take ideas from code presented in labs, lecture, or sections but you need to change it, adapt it to your style, and ultimately write your own. We do not want to see code copied verbatim from the above sources.
- if you use code found on the internet, books, or other sources you need to cite those sources.
- you should not view any written materials or code created by other students for the same assignment;
- you may not provide or make available solutions to individuals who take or may take this course in the future.
- if the assignment allows it you may use third-party libraries and example code, so long as the material is available to all students in the class and you give proper attribution. Do not remove any original copyright notices and headers.
Late or Wrongly Submitted Assignments
There are no late days in homework submission. We will accept late submissions only for medical reasons and if accompanied by a doctor's note.
To submit after Canvas has closed or to ask for an extension, send an email to the Helpline with subject line "Submit HW1: Reason=the flu" replacing 'HW1' with the name of the current assignment and "the flu" with your reason. You need to attach the note from your medical provider otherwise we will not accept the request.
If you forgot to join a Group with your peer and are asking for the same grade we will accept this with no penalty up to HW3. For homeworks beyond that we feel that you should be familiar with the process of joining groups. After that there will be a penalty of -1 point for both members of the group provided the submission was on time.
Re-grade Requests
Our graders and instructors make every effort in grading accurately and in giving you a lot of feedback.
If you discover that your answer to a homework problem was correct but it was marked as incorrect, send an email to the Helpline with
If you decide to send a regrade request, send an email to the Helpline with subject line "Regrade HW1: Grader=johnsmith" replacing 'HW1' with the current assignment and 'johnsmith' with the name of the grader within 48 hours of the grade release.
Auditing the Class
If you would like to audit the class, please send an email to the Helpline indicating who you are and why you want to audit the class. You need a HUID to be included to Canvas. Please note that auditors may not submit assignments for grading or make use of other limited student resources such as office hours.
Academic Integrity
Ethical behavior is an important trait of a Data Scientist, from ethically handling data to attribution of code and work of others. Thus, in CS109b we give a strong emphasis to Academic Honesty. As a student your best guidelines are to be reasonable and fair. We encourage teamwork for problem sets, but you should not split the homework and you should work on all the problems together. For more detailed expectations, please refer to the Collaborations section above.
Accommodations for students with disabilities
Students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with the professor by the end of the second week of the term, (fill in specific date). Failure to do so may result in the Course Head's inability to respond in a timely manner. All discussions will remain confidential, although Faculty are invited to contact AEO to discuss appropriate implementation.
Diversity and Inclusion Statement
Data Science, like many fields of science, has historically only been represented by a small sliver of the population. This is despite some of the early computer scientist pioneers being women (see Ada Lovelace and Grace Hopper for two examples). Recent initiatives have attempted to overcome some barriers to entry: Made w/ Code. We would like to attempt to discuss diversity in data science from time to time where appropriate and possible. Please contact us (in person or electronically) or submit anonymous feedback if you have any suggestions to improve the diversity of the course materials. Furthermore, we would like to create a learning environment for our students that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:
- If you have a name and/or set of pronouns that differ from those that appear in your official Harvard records, please let us know!
- If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with us. We want to be a resource for you. Remember that you can also submit anonymous feedback (which will lead to me making a general announcement to the class, if necessary to address your concerns). If you prefer to speak with someone outside of the course, you may find helpful resources at the Harvard Office of Diversity and Inclusion.
- We (like many people) are still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to us about it. (Again, anonymous feedback is always an option.)
- As a participant in course discussions, you should also strive to honor the diversity of your classmates.