Syllabus
Time and Location
Lectures: Tuesday 1:30PM-2:45PM; Thursday 1:30PM-2:45PM
Location: Zoom meeting room accessible through Canvas
Labs: TBD
Location: Zoom section room accessible through Canvas
Staff
Lead Instructor: David Sondak
Teaching Fellows:
- Hayoun Oh
- Tosin Alliyu
- Simon Warchol
- Dovran Amanov
- Haipeng Lin
- George Touloumes
New Policies for Recording Classroom Sessions Conducted via Zoom
These following policies are taken from Rules and Best Practices for the Recording of Classroom Sessions Conducted via Zoom - IT Help. The main points are summarized below.
- Students are not permitted to record class sessions, including by using Zoom. Links to class session recordings will only be posted in the Zoom link of the Canvas course webpage.
- Students must not disclose any Zoom recording URL - or any copies of the recording the student might create or obtain - to anyone outside the class.
About the Course
Learning Outcomes
After completing this course you should be able to:
- Recognize and recall programming models, platforms, open-source tools and computing architectures that are relevant to computational and data science
- Assessment of computational approaches and their practical applications to scientific problems
- Fundamentals of parallel computing including abstract thinking and algorithmic development
- Identify which parallel strategy is appropriate to solve a given problem
- Design efficient parallel solutions to scientific problems
- Implement parallel programs in prominent programming models and evaluate their performance using various metrics
- Use a collection of open source software and state-of-the-art HPC platforms for data analysis, modelling, and visualization of real scientific problems
Prerequisites
Students are expected to have basic programming experience, familiarity with Python and C, basic knowledge of Linux including using the command line, and basic understanding of algorithms (CS107/AC207 or CS50).
Intended Audience
The course is aimed at students with a background in a scientific discipline who will not typically have a traditional Computer Science background, though basic programming knowledge is assumed as a prerequisite. We hope to attract students from the life sciences, physical sciences, economics, social sciences, medicine, and the humanities interested in developing applications for large-scale computational or data processing.
This course is also for computer science, engineering, and undergraduate students that need to make decisions about the architecture of a system, choose tools for solving a given problem and figure out how best to apply them, or better understand the strengths and weaknesses of existing systems and tools.
Required Textbook
None.
Suggsted Textbooks
You may find the following texts helpful to supplement the content covered in the course. None of the homework problems or course content will be taken directly from any of these books. You should use them as supplementary material.
- Introduction to High Performance Computing for Scientists and Engineers; Georg Hager; CRC Press; July 2010
- Introduction to parallel computing; Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar; Pearson, January 2003
- Programming Massively Parallel Processors: A Hands-on Approach; David B. Kirk and Wen-mei W. Hwu; Morgan Kaufmann; February 2010
- The Art of Multiprocessor Programming; Maurice Herlihy, Nir Shavit; Morgan Kaufmann; June 2012
- Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable, and Maintainable Systems; Martin Kleppmann; O'Reilly Media, March 2017
- MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop; Adam Shook and Donald Miner; O'Reilly Media, December 2012
- Data Analysis in the Cloud, Models, Techniques and Applications; Domenico Talia, Paolo Trunfio, Fabrizio Marozzo, September 2015
- Cloud Computing for Science and Engineering; Ian Foster, Dennis Gannon; The MIT Press Cambridge, 2017
Course Format
The course is designed to study and discuss the principles (reading assignments and lectures), to develop practical skills (hands-on sessions, programming assignments and infrastructure guides), to expose students to real-world life experiences (case studies and guest lectures), and to apply the concepts to solve a real-life problem (project).
Reading Assignments
Some lectures include a required reading to ensure that you are prepared for the activities in class. You are expected to complete the required reading and answer the associated questions. The questions will be posted on Piazza. Post your comments under the Piazza note announcing the publication of the reading assignment using the “Follow-up discussion” feature. Examples of good comments:
- Clarification of some point or detail in the readings
- Critiques of arguments made in the readings
- Analysis of implications or future directions for work discussed
- Questions about the readings or answers to other people's questions
- Links to web resources or examples that pertain to a reading
Typically you should set aside 1-2 hours to complete each reading assignment. Even though we do not expect you to fully understand everything before coming to class, you will often have to read some passages several times to build your understanding. The goal of the reading assignments is to prepare for class, to familiarize yourself with new terminology and definitions, and to determine which part of the subject needs more attention.
Lecture Sessions
Lecture attendance is strongly encouraged. You should treat the lecture as you would in a normal, in-person semester. However, lecture attendance is not strictly mandatory to accommodate a variety of time zones. If you are in a time zone in which you are able to attend lecture, then you should do so. If your time zone precludes you from attending lecture, then you may watch the recorded lecture, which can be access via Canvas.
Lectures are organized under themes and include explanation of theoretical concepts to build a conceptual framework, and simple examples and case studies to illustrate the theory. They may also include discussion of reading assignments to develop problem-solving strategies and critical thinking. Please arrive on time. Lectures will be accompanied by concept quizzes to assess your understanding of the material and to help us identify gaps.
Hands-on Sessions
Hands-on session attendance should be treated in the same manner as lecture attendance. Hands-on sessions provide an opportunity to learn and practice the main programming models, which will be used in the programming assignments (homework) and the final project. Students should prepare the execution framework needed to do the exercises according to the guidelines provided by the instructors before coming to the session. The course includes hands-on exercises on AWS cloud and Harvard’s Cannon supercomputer.
Lab Sessions
Lab attendance is mandatory. These sessions count toward synchronous learning sessions and allow you to interact with members of the teaching staff. Lab sessions are used to allow students to become familiar with the computing and data processing infrastructure on AWS by following the infrastructure guides and to provide help with the homework (programming assignments) and the final project.
Please refer to the main page for the specific lab times. You may attend any lab session that you wish and as many sessions as you like in a given week. You may also change which lab you attend each week. There are no deliverables for the lab sessions. Your lab grade will be based on attendance.
Guest Lectures
Guest lectures are given by experts with proven expertise in the course topics to expose students to real-world life experiences about the application of the models and platforms covered in the course.
Quizzes
Quizzes will be released on Canvas by the end of the lecture sessions and the material will be based on what was discussed in the class. Each quiz will be available for 24 hours from the end of the corresponding lecture and you will have 5 minutes to take the quiz. There will be no retakes or makeup quizzes for any reason. Your lowest score on all quizzes will be dropped.
Homework
Lectures are complemented by homeworks to bridge theory and practice. Homework will mostly consist of basic programming assignments to exercise a technology or programming model. Homework assignments will be posted on the website on Mondays and will be due on a subsequent Monday (listed in the course schedule).
There are no late days. Please contact me directly in the event of an emergency.
Infrastructure Guides
Infrastructure guides help with the deployment of parallel computing and big data processing frameworks on the AWS cloud for developing, testing and evaluating the programming assignments and the final project.
Project
A major component of the course is a final programming project. Your final project is to solve a compute or data intensive scientific problem using the platforms, tools and systems introduced in the course. You will collect the data, implement the tool, and analyze the performance of an end to end application. You are required to form teams and to partition the work among the team members. The final project has six milestones: 1. Team formation 2. Rough draft of project proposal 3. An in-class presentation of your project proposal 4. An in-class presentation of your progress with the design of the project 5. Submission of project deliverables 6. Submission and final presentation to teaching staff.
Further details about the project will be updated under the Projects page.
Exams
We will not have standard midterm or final exams. Instead, we will have in-class quizzes, two in-class presentations on project proposal and progress, and a final project presentation during the scheduled final exam period.
Piazza
We'll be using Piazza for online class discussions. We will also use Piazza for all course announcements. Piazza is your main venue to ask questions, discuss problems, and help each other out. It should always be your first recourse for seeking answers to your questions about the course, lecture or reading material, or the assignments. Participation on Piazza will factor into your participation grade for the course.
Office Hours
The instructors and the teaching fellows hold weekly office hours. Office hour times and locations are listed on the class homepage. Office hours provide you with an opportunity to review and discuss course materials as well as provide further guidance for your homework in a more intimate environment, with only your teaching fellow and maybe a handful of classmates present.
Grading
Relative Weighting
You will be graded on homework assignments, a final project, in-class quizzes, and participation. There will be no exams. The final grade will be composed as follows:
- Homework (40% - 400 points): Individual submission of assignments.
- Final Project (40% - 400 points): A project of your own design to be worked on in small teams.
- Quizzes (10% - 100 points): Assessments of your understanding of the material.
- Participation (10% - 100 points): Piazza posts, reading assignments, lab attendance.
Homework Grading
Homework will be graded based on 1) how correct your code is (the code should compile and run, we are not troubleshooting code), and 2) how you have interpreted the results in a report. Your work will be evaluated holistically beyond mechanical correctness and focus on the overall quality of the work.
Homework Regrading
It is very important to us that all assignments are properly graded. If you believe there is an error in your assignment grading, please submit an explanation via email to us within 7 days of receiving the grade. No regrade requests will be accepted orally and no regrade requests will be accepted more than 7 days after you receive the grade for the assignment. Also, note that requesting a regrade applies to the entire assignment.
Late Days
No homework assignments or project milestones will be accepted for credit after the deadline. If you have a verifiable medical condition or other special circumstances that interfere with your coursework please let us know as soon as possible.
Policies
Contingency Plan
In the event of a prolonged Zoom outage, the course lecture will be recorded offline and posted to Canvas as soon as possible.
Accessibility
Any student receiving accommodations through the Accessible Education Office should present their AEO letter as soon as possible. Failure to do so may prevent us from making appropriate arrangements.
Devices in Class
We will use laptops throughout the term to facilitate activities and project work in-class. However, research and student feedback clearly shows that using devices on non-class related activities not only harms your own learning, but other students’ learning as well. Therefore, we only allow device usage during activities that require devices. At all other times, you should not be using your device. We may help you remember this by announcing when to bring devices out and when to put them away. This will be hard to enforce in a Zoom class, but we ask for your cooperation in this matter.
Participation
Helping each other out and discussing the reading assignments and lectures is a key aspect of this course. All students are expected to contribute online on Piazza and during lectures. Participation on Piazza will contribute to the final grade.
Collaboration
You are welcome to discuss the course's material and homework with others in order to better understand it, but the work you turn in must be your own (with some exceptions, e.g., the final project, where work is explicitly shared). You are encouraged to discuss programming assignments with classmates, but should be open about such cooperation, and should attribute anyone you collaborated with in your homework.
There is a balance to be struck between submitting your own work (to demonstrate you're learning the material) and discussion/collaboration with others (to enhance learning through mutual assistance). In general, avoid sharing actual code (particularly code you hand in), but feel free to discuss, diagram, use pseudocode, and even share small amounts of code ("snippets"). If you are in doubt as to the appropriateness of some level of collaboration with other students, contact the course instructor.
The class staff will be using codeanalysis tools to compare students work; plagiarism will not be tolerated, and students may be asked to work more independently if their work is too similar. You may not submit the same or similar work to this course that you have submitted or will submit to another, without permission. You must acknowledge any source code that was not written by you by mentioning the original author(s) directly in your source code (comment or header), or in a README.txt file accompanying your submission. Do not remove any original copyright notices and headers. All forms of academic dishonesty will be forwarded to the Harvard College. For more information please consult the Harvard academic integrity guidelines: Academic Integrity and Academic Dishonesty.
Credits
The lecture material is adapted from the books and online research resources relevant to the course topics. Please contact us if you find materials where the credit is missing or that you would rather have removed.
Diversity and Inclusion Statement
Computer science, like many fields of science, has historically only been represented by a small portion of the population. This is despite some of the pioneers in computer science being from groups that are historically and presently underrepresented. Whenever possible, I will try to highlight the contributions that people have been from a variety of backgrounds. To start, here is a list of some really nice references.:
I welcome any additions to this list you may have!
In an ongoing effort to foster a more inclusive environment in computer science, recent initiatives have attempted to overcome some barriers to entry for underrepresented groups:
Like the first list above, this list is not enhaustive, but I welcome any additions and suggestions you may have.
I would like to attempt to discuss diversity in computer science from time to time where appropriate and possible.
Please contact me (in person or electronically) or submit anonymous feedback if you have any suggestions to improve the quality of the course materials. The best way to provide anonymous feedback is to use Piazza, which allows you to provide comments anonymously.
Furthermore, I would like to create a learning environment for my students that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:
- If you have a name and/or set of pronouns that differ from those that appear in your official Harvard records, please let me know!
- If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. I want to be a resource for you. Remember that you can also submit anonymous feedback (which will lead to me making a general announcement to the class if necessary to address your concerns). If you prefer to speak with someone outside of the course, you may find helpful resources at the Harvard Office of Diversity and Inclusion.
- I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it. (Again, anonymous feedback is always an option.)
- As a participant in course discussions, you should also strive to honor the diversity of your classmates.