Syllabus



Course description

Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and deep neural networks, statistical modeling, and prediction. Topics include big data and database management,  multiple deep learning subjects such as CNNs, RNNs, autoencoders, and generative models as well as basic Bayesian methods, nonlinear statistical models and unsupervised learning.

The programming language will be python.

Tentative Course Topics

Course Objectives

Upon successful completion of this course, you should feel comfortable with the material mentioned above, plus you will have had experience in working with others on real-world problems. Both the content knowledge, the project, and teamwork, will prepare you for the professional world or further studies.

Course Components

There will be live video feed available only to distance education students for lectures, labs, and advanced sections. Recordings for all other students will be available within 24 hrs.

Lectures

The class meets twice a week for lectures. Attending lectures is a crucial component of learning the material presented in this course.

Labs

Lectures are supplemented by weekly programming labs. Labs are an important aspect of the course, as they supplement material from lectures with examples, discuss programming environments, and teach you important skills.

In-class Quizzes

At the end of each lecture, we will ask you to take a short graded quiz on the material presented in class. These quizzes will count towards your final grade.

Advanced Sections

The course will include advanced sections for 209 students and will cover a different topic per week.  These are 75 min lectures and they will cover advanced topics like the mathematical underpinnings of the methods seen in lecture and lab and extensions of those methods.  The material covered in the advanced sections is required for all ac209 students. Tentative topics are:

Exams

There are no exams in this course.

Projects

 - For distant students
The goal of the project is to have a complete end-to-end data science process encompassing both semesters of subject material while working as a 3-4 person team. We will supply a small set of project choices within the thematic categories. Teams may propose a different project with sufficient notice and will be subject to approval by the course staff.
 - For campus students
During the final four (4) weeks of the course, students will be divided in break-out thematic sections where they will study topics such as medicine, law, astronomy, e-commerce, and government. Each section will include lectures by Harvard faculty, experts on the field, followed by project work also led by that faculty.  You will get to present your projects in the SEAS Design Fair at the end of the semester.

Homework Assignments

There will be seven graded homework assignments. Some of them will be due one week after being assigned and some will be due two weeks after being assigned. For five assignments, you have the option to work and submit in pairs, the two remaining are to be completed individually.

Course Resources

Online Materials

All course materials, including lecture notes, lab notes, and section notes will be published in the class GitHub:

https://github.com/Harvard-IACS/2019-CS109B. Assignments will only be posted on Canvas.

Working environment

You will be working in Jupyter Notebooks which you can run in your own machine or in the SEAS JupyterHub cloud (details on this to come).

Recommended Textbooks

Free electronic versions are available (ISLR), (DL) or hard copy through Amazon (ISLR), (DL).

Getting Help

For questions about homework, course content, package installation, the process is:

Course Policies and Expectations

Grading for CS109b, STAT121b, and CS209b:

<<<<<<< HEAD

=======

3aa6558f04b882eacb6af4d0b69db82445ca5329 Paired-option Homeworks 45% (5 homework for which you have the option to work in pairs) Individual Homeworks 25% (2 homework which you must complete individually) Quizzes 10% (you may drop 40% of the quizzes) Project 20%


TOTAL 100%

Note: Regular homework (for everyone) counts as 5 points. 209b extra homework counts as 1 point.

Grading for DCE:


TOTAL 100%

Collaboration Policy

We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board. If you work with a partner on an assignment make sure both parties solve all the problems. Do not divide and conquer. You are expected to be intellectually honest and give credit where credit is due. In particular:

Late or Wrongly Submitted Assignments

There are no late days in homework submission. We will accept late submissions only for medical reasons and if accompanied by a doctor's note.

To submit after Canvas has closed or to ask for an extension, send an email to the Helpline with subject line "Submit HW1: Reason=the flu"  replacing 'HW1' with the name of the current assignment and "the flu" with your reason. You need to attach the note from your medical provider otherwise we will not accept the request.

If you forgot to join a Group with your peer and are asking for the same grade we will accept this with no penalty up to HW3. For homeworks beyond that we feel that you should be familiar with the process of joining groups. After that there will be a penalty of -1 point for both members of the group provided the submission was on time.

Re-grade Requests

Our graders and instructors make every effort in grading accurately and in giving you a lot of feedback.

If you discover that your answer to a homework problem was correct but it was marked as incorrect, send an email to the Helpline with a description of the error.  Please do not submit regrade requests based on what you perceive is overly harsh grading.  The points we take off are based on a grading rubric that is being applied uniformly to all assignments.

If you decide to send a regrade request, send an email to the Helpline with subject line "Regrade HW1: Grader=johnsmith"  replacing 'HW1' with the current assignment and 'johnsmith' with the name of the grader within 48 hours of the grade release.

Auditing the Class

If you would like to audit the class, please send an email to the Helpline indicating who you are and why you want to audit the class. You need a HUID to be included to Canvas.

Academic Integrity

Ethical behavior is an important trait of a Data Scientist, from ethically handling data to attribution of code and work of others. Thus, in CS109 we give a strong emphasis to Academic Honesty. As a student your best guidelines are to be reasonable and fair. We encourage teamwork for problem sets, but you should not split the homework and you should work on all the problems together. For more detailed expectations, please refer to the Collaborations section above.

Accommodations for students with disabilities

Students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with the professor by the end of the second week of the term, (fill in specific date). Failure to do so may result in the Course Head's inability to respond in a timely manner. All discussions will remain confidential, although Faculty are invited to contact AEO to discuss appropriate implementation.

Diversity and Inclusion Statement

Data Science, like many fields of science, has historically only been represented by a small sliver of the population. This is despite some of the early computer scientist pioneers being women (see Ada Lovelace and Grace Hopper for two examples). Recent initiatives have attempted to overcome some barriers to entry: Made w/ Code. We would like to attempt to discuss diversity in data science from time to time where appropriate and possible. Please contact us (in person or electronically) or submit anonymous feedback if you have any suggestions to improve the diversity of the course materials. Furthermore, we would like to create a learning environment for our students that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, etc.) To help accomplish this: