AC295: Advanced Practical Data Science

Lecture 6: Transfer Learning 2

Harvard University
Spring 2020
Instructors: Pavlos Protopapas
TF: Michael Emanuel, Andrea Porelli and Giulia Zerbini
Author: Andrea Porelli and Pavlos Protopapas


Part 1: Deep Learning Based Image Recognition

1.1 Image Recognition and Classification

  • Active interdisciplinary field under the umbrella of computer vision.
  • It is the task of identifying an object within an image or video/sequence.
  • Traditionally, this field has leveraged advancements in mathematical and computer-aided modeling, and the design of objects.
  • Several hand-annotated datasets have been developed over the years to test and evaluate image recognition systems.
  • Traditional tecniques were dominating the scene and iteratively improving upon the task until recentely.
  • In 2012, deep learning arrived at the ImageNet competition and opened the floodgates for rapid improvements and advancements in computer vision and deep learning techniques.

1.2 Introduction to Image Classification using Deep Learning

  • Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of image classification.
  • CNNs are specialized neural networks to handle image data.
  • CNNs help us infer shift and space invariant features through their shared weight architectures, and are basically a variant of feed forward networks.
  • Below is a quick overview of CNNs' components. source: WIKIPEDIA

1.3 Benchmark Datasets

  • Image classification is a supervised task.
  • The number of parameters to train a CNN might be huge, thus required a lot of labeled images and huge datasets are required.
  • Luckily research groups across the globe have been working toward collecting, hand annotating, and crowdsourcing different datasets.
  • These datasets are utilized to benchmark performance of different algorithms.
  • Below is a short list of widely accepted benchmarking datasets in the field of image classification:
    • ImageNet: 14M images, 20K categories, Princeton University, 2009
    • 80 Millions Tiny Images datasets: 80M low resolution images colled from the internet, 75K non-abstract nouns, MIT. It is the basis for other famous datasets including CIFAR
    • CIFAR-10: 60K low resolution images, 10 non-overlapping classes, Canadian Institute for Advanced Research. Most wide used dataset in ML
    • CIFAR-100: 60K low resolution images evenly spread across 100 classes, Canadian Institute for Advanced Research
    • Common Object in Context (COCO): 200K images evenly spread across different classes. It is a large scale visual database for object identification, segmentation, and captioning
    • Open Images: 9M images with labels
    • Caltech 101 and Caltech 256: 9K and 30K images respectively, spanning across 101 and 256 classes
    • Stanford Dog dataset: 20K colord images, 200 classes. It is specific to different dog breads
    • MNIST: 60K hand-labeled digits (zero to nine). One of the most famous datasets for ML

1.4 State of the Art (SOTA) Deep Image Classification Models

  • Deep learning has received much attention and hype over the years, it is not a surprised that a lot of conferences, journals and competitions worldwide centered around that.
  • Image classification architectures have experienced iterative improvements on a regular basis.
  • Let's look at the best SOTA models:
    • AlexNet
    • VGG-16
    • Inception (GoogleNet)
    • ResNet
    • MobileNet
    • DenseNet

see lecture 5 for more information about SOTAs

Part 2: Use Cases

Let's use Transfer Learning, to build some applications. It is convenient to run the applications on Google Colab. Check out the links below.

2.1 Image Classification and Transfer Learning

  • You know what image classification is all about. Now we will use pretrained model(s) to understand how we can leverage transfer learning to improve upon our models.
  • In particular we are going to focus on fine-grained image classification which refers to the task of recognizing different sublcasses within a higher-level class.
  • We will focus around the Stanford Dogs Dataset which contains the images of different dog breeds. Hence the task is to categorize different dog breeds.
  • We are going to train the model on a subset of the entire dataset.
  • The keys areas of focus in this use case will be:
    • Data preparation
    • Train dog breeds classifier using Transfer Learning
  • Find more on the colab notebook Lecture 6: Transfer Learning across Tasks: Image Classification and Transfer Learning

2.2 Style Transfer

  • Painting present a complex interplay of content and style. Photograph are a combination of perspectives and lights. Let's combine them for spectacular and surprising results.
  • Our goal is to modify the original image (a bird's eye view of the city of Venice) adding the style, colors and stroke paterns from a piece of art (in our case a mosaic designed by Gaudi')
  • The outcome will be the result of a Transfer Learning algorithm presented in the paper A Neural Algorithm of Artistic Style
  • The key areas of focus in this use case will be:
    • Understanding neural style transfer
    • Image preprocessing
    • Building loss function
    • Constructing a custom optimizer
    • Style transfer in action
  • Find more on the colab notebook Lecture 6: Transfer Learning across Tasks: Style Transfer Model
In [ ]: