Topics in Applied Computation:
Advanced Practical Data Science



Spring 2020

Pavlos Protopapas


Welcome to AC295. In this course we explore advanced practical data science practices. The course will be divided into three major topics:

1. How to scale a model from a prototype (often in jupyter notebooks) to the cloud. In this module, we cover virtual environments, containers, and virtual machines before learning about microservices and Kubernetes. Along the way, students will be exposed to Dask.

2. How to use existing models for transfer learning. Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks. This can be very important, given the vast compute and time resources required to develop neural network models on these problems and given the huge jumps in skill that these models can provide to related problems. In this part of the course we will examine various pre-existing models and techniques in transfer learning.

3. In the third part we will be introducing a number of intuitive visualization tools for investigating properties and diagnosing issues of models. We will be demonstrating a number of visualization tools ranging from the well established (like saliency maps) to recent ones that have appeared in https://distill.pub.



Lectures: Tuesday and Thursday 4:30‐5:45 pm in Cruft 309
TFs: Michael Emanuel, Andrea Porelli, Giulia Zerbini
Office Hours: TBD