Extreme scale data science at the convergence of big data and massively parallel computing is enabling simulation, modelling and real-time analysis of complex natural and social phenomena at unprecedented scales. The aim of the project is to gain practical experience into this interplay by applying parallel computation principles in solving a compute and data-intensive problem.

Your final project is to solve a data-intensive or a compute-intensive problem with parallel processing on the AWS cloud or on Harvard’s supercomputer: Odyssey (or both!). You will identify a compute or and data science problem, analyse its compute scaling requirement, collect the data, design and implement a parallel software, and demonstrate scaled performance of an end-to-end application.


Project Requirements

Your project should consider the following aspects:

Note that your project is not required to include all of these aspects. However, projects that do include more of the listed aspects or a higher level of difficulty will be weighted accordingly.

Group Size

Students are required to form teams and to partition the work among the team members. The final project must be done in teams with 3-4 students each (exceptions by permission of the instructor). You can use the course forum to find prospective team members. You may also find and discuss project ideas on the forum. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member if it becomes apparent that one of them put in a vastly different amount of effort than the others.

Project Milestones

There are five milestones for your final project. It is critical to note that no extensions will be given for any of these milestones for any reason. Projects submitted after the final due date will not be graded.

Project Proposal

Your group needs to present a project proposal (and submit the PDF of the presentation) with the following sections:

You will have 5, and ONLY 5, minutes to briefly summarize your proposal. You have to prepare 2-3 slides for your proposal. We will enforce the 5-minute time limit.

Project Progress (Design)

Your group needs to present a project progress (and submit the PDF of the presentation) covering the main aspects in the design of the parallel application with the following sections:

You will have 5, and ONLY 5, minutes to briefly summarize your proposal. You have to prepare 2-3 slides for your progress. We will enforce the 5-minute time limit.

Project Deliverables

Project Web Site

An important piece of your final project is a public web site that describes all the great work you did for your project. The web site serves as the final project report, and needs to describe your complete project. You can use GitHub Pages, or the README file on the GitHub repository, so you can easily refer to the software at the GitHub repository. You should assume the reader has no prior knowledge of your project and has not read your proposal. It should address the following aspects:

Your web page should include screenshots of your software that demonstrate how it functions. You should include a link to your source code.

Project Software

Your final project can be implemented using any API or programming language you would like. Make your own repository on GitHub with a link to your project web page. Software with evaluation data sets, test cases should be available on the repo. Include a README that describes the code and application files, and how your program should be run. We will be grading these projects on a variety of platforms, so you must include detailed instructions on how to run or compile your code. If we cannot run your application from the instructions included with your submission, we will not be able to grade this portion of your project. Your performance results should be reproducible, so you should provide all the information of the system and the environment needed to reproduce your tests.

Project Presentations

You will have 10, and ONLY 10, minutes to briefly present your project followed by 5 minutes of discussion time. You may prepare 4-5 slides for your summary, but we will enforce the 10-minute time limit. Focus the majority of your presentation on your main contributions rather than on technical details. What do you feel is the coolest part of your project? What insights did you gain? What is the single most important thing you would like to show the class? Upload the presentation to the GitHub repo and on Canvas.

Project Grading

The final project grades are dependent on the following criteria:

Project will be graded on the depth of work undertaken and communication (web site, presentation):

Extra points may be earned for the use of advanced features like:

Examples from Previous Years