Syllabus



Introduction to Data Science

CS 109A, STAT 121A, AC 209A, CSCI E-109A

Syllabus - Fall 2019

Welcome to CS109a/STAT121a/AC209a, also offered by the DCE as CSCI E-109a, Introduction to Data Science. This course is the first half of a one‐year introduction to data science. The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods.

The material of the course is divided into 3 modules. Each module will integrate the five key facets of an investigation using data:

1. data collection ‐ data wrangling, cleaning, and sampling to get a suitable data set;

2. data management ‐ accessing data quickly and reliably;

3. exploratory data analysis – generating hypotheses and building intuition;

4. prediction or statistical learning; and

5. communication – summarizing results through visualization, stories, and interpretable summaries.

Only one of CS 109a, AC 209a, or Stat 121a can be taken for credit. Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS 109a, AC 209a, or Stat 121a for credit.

Course Logistics

Prerequisites

You are expected to have programming experience at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended). HW0 is designed to test your knowledge on the prerequisites. Successful completion of this assignment will show that this course is suitable for you. HW0 will not be graded but you are required to submit.


Course Components

Lectures

The class consists of two weekly lectures. Lectures are held Mon and Wed 1:30pm ‐ 2:45 pm in Northwest Building (NW), Lecture Hall B-103.

Attendance to lectures is mandatory for FAS students. We will have in class quizzes to assess your understanding of the material and to help us identify gaps.

Labs

Labs are designed as hands-on in-class activities. The instructor will go over practice problems similar to the homework problems and review difficult material. Attendance to labs is optional but strongly encouraged.

Lab sessions are held Thur 4:30-5:45 pm in Pierce 301.

Sections

Lectures and labs are supplemented by 1 hour sections led by teaching fellows. There are two types of sections:

a) Standard Sections: which will be a mix of review of material and practice problems similar to the HW

Standard Sections are held Fri 10:30-11:45 am at 1 Story St. Room 306 and Mon 4:30-5:45 pm in Science Center 110

Note: Sections are not held every week. Consult the course calendar for exact dates.

The material covered on Friday and Monday is identical.

b) Advanced Sections which will cover advanced topics like the mathematical underpinnings of the methods seen in lecture and lab and extensions of those methods. The material covered in the Advanced Sections is required for all AC 209A students.

Advanced Sections are held Weds 4:30-5:45 pm at Maxwell Dworkin G115

Note: Advanced Sections are not held every week. Consult the course calendar for exact dates.

Video Recordings

All learning instances—lectures, labs, and sections—will be recorded. DCE students can stream these events in real time with a TF monitoring the accompanying chat room. Recordings will then be made available to all students within 24 hours via Canvas.

Instructor Office Hours

Pavlos & Kevin: Monday 3-5pm, IACS lobby

Chris: Wednesday 3-4pm, Maxwell-Dworkin B125

Assignments

There will be an initial self-assessment homework called HW0 and 8 more graded homework assignments. Some of them will be due in a week (1, 2, 5, 8) and some of them in two weeks (3, 4, 6, 7).

You have the option to work and submit in pairs for all the assignments except HW4 and HW7 which you will do individually.

You will be working in Jupyter Notebooks which you can run in your own environment or in the SEAS JupyterHub cloud.

Instructions for Setting up Your Environment

Instructions for Using JupyterHub

On weeks with new assignments they will be released by Wednesday 3 pm.

Standard assignments are graded out of 5 points.

AC209a students will have additional homework content for most assignments worth 1 point.

Quizzes

Quizzes will be taken at the end of class and the material will be based on what was discussed in lecture; there will be no AC209a content in the quizzes. DCE students' quizzes will not count toward their final grade.

50% of the quizzes will be dropped from your grade.

Final Project

There will be a final group project (2-4 students) due during Exams period. See Calendar for specific dates.

Participation

Students as expected to be actively engaged with the course. This includes:

  1. attending lectures
  2. making use of resources such as office hours, labs, and sections
  3. participating in the Ed discussion forum — both through asking thoughtful questions and by answering the questions of others

DCE students will not be penalized for the inability to attend lectures, labs, etc. live.


Recommended Textbook

An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani.

The book is available here:

Free electronic version: http://www-bcf.usc.edu/~gareth/ISL/ (Links to an external site).

HOLLIS: http://link.springer.com.ezp-prod1.hul.harvard.edu/book/10.1007%2F978-1-4614-7138-7

Amazon: https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370 (Links to an external site).


Course Policies

Getting Help

For questions about homework, course content, package installation, JupyterHub, and after you have tried to troubleshoot yourselves, the process to get help is:

1. Post the question in Ed and hopefully your peers will answer. Note that in Ed questions are visible to everyone. The TFs monitor the posts.

2. Go to Office Hours, this is the best way to get help.

3. For private matters send an email to the Helpline: cs109a2019@gmail.com. The Helpline is monitored by TFs.

4. For personal and confidential matters send an email to the instructors.

Grading Guidelines and Regrading Policy

Homework will be graded based on:

1. How correct your code is (the Notebook cells should run, we are not troubleshooting code)

2. How you have interpreted the results — we want text not just code. It should be a report.

3. How well you present the results.

The scale is 1-5.

For more details, check out The CS109A Grade.

Questions on Graded Homework and Regrading Policy

We take great care in making sure all homework assignments are graded properly. However, if you find that your assignment has grading oversights/errors you may: Email the helpline with subject line “Regrade HW1: Grader=johnsmith” within 48 hours of the grade release (the grader’s name can be found at the end of your notebook). If still unsatisfied with first regrading outcome, you may submit a reason via email to the Helpline with subject line “Regrade HW1: Second request” within 2 days of receiving the initial regarding response.

NOTE: once regrading is done, you may receive a grade that could be lower than the initial grade.

Collaboration Policy

We encourage you to talk and discuss the assignments with your fellow students (and on Piazza), but you are not allowed to look at any other student’s assignment or code outside of your pair. Discussion is encouraged; copying is not allowed. Please refer to the Academic Honesty section in The CS109A Grade.

Late Day Policy

Homework is due on Wednesdays. There are no late days. Late submissions will not be accepted.

Communication from Staff to Students

Class announcements will be through Canvas. All homework and quizzes will be posted and submitted through Canvas, as well as all feedback forms.

NOTE: make sure you adjust your account settings so you can receive emails from Canvas.

Submitting an assignment

Please consult Homework Policies & Submission Instructions

Course Grade

Your final score for the course will be computed using the following weights:

Non-Extension Students

Assignment Final Grade Weight
Homework0 1%
Paired Homework (6) 39%
Individual Homework (2) 17 %
Quizzes 10%
Project 30%
Participation 3%
Total 100%


Extension Students

Assignment Final Grade Weight
Homework0 1%
Paired Homework (6) 43%
Individual Homework (2) 19%
Project 33%
Participation 4%
Total 100%


Software

We will be using Jupyter Notebooks, Python 3, and various python modules. You can access the notebook viewer either on your own machine by installing the Anaconda platform (Links to an external site) which includes Jupyter/IPython as well all packages that will be required for the course, or by using the SEAS JupyterHub from Canvas. Details in class.

Accommodations for Students with Disabilities

Students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with Kevin by the end of the third week of the term: Friday, September 15. Failure to do so may result in us being unable to respond in a timely manner. All discussions will remain confidential.

Diversity and Inclusion Statement

Data Science, like many fields of science, has historically only been represented by a small sliver of the population. This is despite some of the early computer scientist pioneers being women (see Ada Lovelace and Grace Hopper for two examples). Recent initiatives have attempted to overcome some barriers to entry: Made /w Code.

We would like to attempt to discuss diversity in data science from time to time where appropriate and possible. Please contact us (in person or electronically) or submit anonymous feedback if you have any suggestions to improve the diversity of the course materials. Furthermore, we would like to create a learning environment for our students that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:If you have a name and/or set of pronouns that differ from those that appear in your official Harvard records, please let us know!

If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with us. We want to be a resource for you. Remember that you can also submit anonymous feedback (which will lead to me making a general announcement to the class, if necessary to address your concerns). If you prefer to speak with someone outside of the course, you may find helpful resources at the Harvard Office of Diversity and Inclusion.

We (like many people) are still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to us about it. (Again, anonymous feedback is always an option.)

As a participant in course discussions, you should also strive to honor the diversity of your classmates.

Academic Honesty

Ethical behavior is an important trait of a Data Scientist, from ethically handling data to attribution of code and work of others. Thus, in CS109 we give a strong emphasis to Academic Honesty.

As a student your best guidelines are to be reasonable and fair. We encourage teamwork for problem sets and allow submission in pairs for most problem sets, but you should not split the homework and you should work on all the problems together.

For more detailed expectations, please refer to the Academic Honesty section in The CS109A Grade.