CS-109B Introduction to Data Science

Lab 5: Convolutional Neural Networks

Harvard University
Spring 2019
Lab instructor: Eleni Kaxiras
Instructors: Pavlos Protopapas and Mark Glickman
Authors: Eleni Kaxiras, Pavlos Protopapas, Patrick Ohiomoba, and Davis Sontag

In [1]:
# RUN THIS CELL TO PROPERLY HIGHLIGHT THE EXERCISES
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2019-CS109B/master/content/styles/cs109.css").text
HTML(styles)
Out[1]:

Learning Goals

In this lab we will look at Convolutional Neural Networks (CNNs), and their building blocks.

By the end of this lab, you should:

  • know how to put together the building blocks used in CNNs - such as convolutional layers and pooling layers - in keras with an example.
  • have a good undertanding on how images, a common type of data for a CNN, are represented in the computer and how to think of them as arrays of numbers.
  • be familiar with preprocessing images with keras and sckit-learn.
  • use keras-viz to produce Saliency maps.
  • learn best practices for configuring the hyperparameters of a CNN.
  • run your first CNN and see the error rate.
In [2]:
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (5,5)

import numpy as np
from scipy.optimize import minimize

import tensorflow as tf
import keras
from keras import layers
from keras import models
from keras import utils
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Flatten
from keras.layers import Dropout
from keras.layers import Activation
from keras.regularizers import l2
from keras.optimizers import SGD
from keras.optimizers import RMSprop
from keras import datasets
from keras.preprocessing.image import ImageDataGenerator

from keras.callbacks import LearningRateScheduler
from keras.callbacks import History

from keras import losses
from keras.datasets import mnist
from keras.utils import to_categorical

from sklearn.utils import shuffle

print(tf.VERSION)
print(tf.keras.__version__)

%matplotlib inline
1.12.0
2.1.6-tf
Using TensorFlow backend.

Prologue: keras-viz Visualization Toolkit

keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models. Currently supported visualizations include:

  • Activation maximization
  • Saliency maps
  • Class activation maps

All visualizations by default support N-dimensional image inputs. i.e., it generalizes to N-dim image inputs to your model. Compatible with both theano and tensorflow backends with 'channels_first', 'channels_last' data format.

Read the documentation at https://raghakot.github.io/keras-vis.https://github.com/raghakot/keras-vis

To install use pip install git+https://github.com/raghakot/keras-vis.git --upgrade

SEAS JupyterHub

Instructions for Using SEAS JupyterHub

SEAS and FAS are providing you with a platform in AWS to use for the class (accessible from the 'Jupyter' menu link in Canvas). These are AWS p2 instances with a GPU, 10GB of disk space, and 61 GB of RAM, for faster training for your networks. Most of the libraries such as keras, tensorflow, pandas, etc. are pre-installed. If a library is missing you may install it via the Terminal.

NOTE : The AWS platform is funded by SEAS and FAS for the purposes of the class. It is not running against your individual credit. You are not allowed to use it for purposes not related to this course.

Help us keep this service: Make sure you stop your instance as soon as you do not need it.

aws-dog

Part 1: Parts of a Convolutional Neural Net

There are three types of layers in a Convolutional Neural Network:

  • Convolutional Layers
  • Pooling Layers.
  • Dropout Layers.
  • Fully Connected Layers.

a. Convolutional Layers.

Convolutional layers are comprised of filters and feature maps. The filters are essentially the neurons of the layer. They have the weights and produce the input for the next layer. The feature map is the output of one filter applied to the previous layer.

The fundamental difference between a densely connected layer and a convolution layer is that dense layers learn global patterns in their input feature space (for example, for an MNIST digit, patterns involving all pixels), whereas convolution layers learn local patterns: in the case of images, patterns found in small 2D windows of the inputs called receptive fields.

This key characteristic gives convnets two interesting properties:

  • The patterns they learn are translation invariant. After learning a certain pattern in the lower-right corner of a picture, a convnet can recognize it anywhere: for example, in the upper-left corner. A densely connected network would have to learn the pattern anew if it appeared at a new location. This makes convnets data efficient when processing images (because the visual world is fundamentally translation invariant): they need fewer training samples to learn representations that have generalization power.

  • They can learn spatial hierarchies of patterns. A first convolution layer will learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on. This allows convnets to efficiently learn increasingly complex and abstract visual concepts (because the visual world is fundamentally spatially hierarchical).

Convolutions operate over 3D tensors, called feature maps, with two spatial axes (height and width) as well as a depth axis (also called the channels axis). For an RGB image, the dimension of the depth axis is 3, because the image has three color channels: red, green, and blue. For a black-and-white picture, like the MNIST digits, the depth is 1 (levels of gray). The convolution operation extracts patches from its input feature map and applies the same transformation to all of these patches, producing an output feature map. This output feature map is still a 3D tensor: it has a width and a height. Its depth can be arbitrary, because the output depth is a parameter of the layer, and the different channels in that depth axis no longer stand for specific colors as in RGB input; rather, they stand for filters. Filters encode specific aspects of the input data: at a high level, a single filter could encode the concept “presence of a face in the input,” for instance.

In the MNIST example that we will see, the first convolution layer takes a feature map of size (28, 28, 1) and outputs a feature map of size (26, 26, 32): it computes 32 filters over its input. Each of these 32 output channels contains a 26×26 grid of values, which is a response map of the filter over the input, indicating the response of that filter pattern at different locations in the input.

Convolutions are defined by two key parameters:

  • Size of the patches extracted from the inputs. These are typically 3×3 or 5×5
  • The number of filters computed by the convolution.

Padding: One of "valid", "causal" or "same" (case-insensitive). "valid" means "no padding". "same" results in padding the input such that the output has the same length as the original input. "causal" results in causal (dilated) convolutions,

In keras see convolutional layers

keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', activation=None, use_bias=True, kernel_initializer='glorot_uniform', data_format='channels_last', bias_initializer='zeros')

How are the values in feature maps calculated?

title

Exercise 1:

  • Compute the operations by hand (assuming zero padding and same arrays for all channels) to produce the first element of the 4x4 feature map. How did we get the 4x4 output size?
  • Write this Conv layer in keras

-- your answer here

b. Pooling Layers.

Pooling layers are also comprised of filters and feature maps. Let's say the pooling layer has a 2x2 receptive field and a stride of 2. This stride results in feature maps that are one half the size of the input feature maps. We can use a max() operation for each receptive field.

In keras see pooling layers

keras.layers.MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None)

Max Pool

c. Dropout Layers.

Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.

In keras see Dropout layers

keras.layers.Dropout(rate, seed=None)

rate: float between 0 and 1. Fraction of the input units to drop.
seed: A Python integer to use as random seed.

References

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

d. Fully Connected Layers.

A fully connected layer flattens the square feature map into a vector. Then we can use a sigmoid or softmax activation function to output probabilities of classes.

In keras see FC layers

keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros')

IT'S ALL ABOUT THE HYPERPARAMETERS!

  • stride
  • size of filter
  • number of filters
  • poolsize

Part 2: Preprocessing the data

Taking a look at how images are represented in a computer using a photo of a Picasso sculpture

In [3]:
img = plt.imread('data/picasso.png')
img.shape
Out[3]:
(150, 200, 4)
In [4]:
img[1,:,1]
Out[4]:
array([0.5411765 , 0.5372549 , 0.53333336, 0.5372549 , 0.5372549 ,
       0.5372549 , 0.54509807, 0.4509804 , 0.41568628, 0.4392157 ,
       0.4509804 , 0.5372549 , 0.54901963, 0.54901963, 0.54509807,
       0.54901963, 0.5411765 , 0.54509807, 0.54509807, 0.56078434,
       0.56078434, 0.56078434, 0.5568628 , 0.56078434, 0.5647059 ,
       0.5647059 , 0.5647059 , 0.56078434, 0.5411765 , 0.53333336,
       0.5372549 , 0.53333336, 0.5294118 , 0.53333336, 0.53333336,
       0.5294118 , 0.53333336, 0.5372549 , 0.5411765 , 0.53333336,
       0.5176471 , 0.5176471 , 0.5254902 , 0.5176471 , 0.50980395,
       0.5176471 , 0.5294118 , 0.5254902 , 0.5254902 , 0.5176471 ,
       0.5176471 , 0.5254902 , 0.53333336, 0.53333336, 0.54509807,
       0.5568628 , 0.56078434, 0.5686275 , 0.5764706 , 0.5686275 ,
       0.5686275 , 0.5764706 , 0.57254905, 0.5686275 , 0.5647059 ,
       0.56078434, 0.5529412 , 0.56078434, 0.5568628 , 0.5529412 ,
       0.5568628 , 0.56078434, 0.5568628 , 0.56078434, 0.5568628 ,
       0.54901963, 0.5411765 , 0.53333336, 0.5372549 , 0.5372549 ,
       0.53333336, 0.5294118 , 0.5372549 , 0.5411765 , 0.5411765 ,
       0.54509807, 0.5411765 , 0.54509807, 0.54509807, 0.53333336,
       0.5294118 , 0.5294118 , 0.5294118 , 0.5294118 , 0.5411765 ,
       0.5372549 , 0.5411765 , 0.54901963, 0.5411765 , 0.5254902 ,
       0.5294118 , 0.5294118 , 0.5294118 , 0.52156866, 0.5137255 ,
       0.52156866, 0.5372549 , 0.5411765 , 0.5411765 , 0.5254902 ,
       0.52156866, 0.5294118 , 0.52156866, 0.5176471 , 0.5254902 ,
       0.5294118 , 0.5294118 , 0.52156866, 0.52156866, 0.5254902 ,
       0.5294118 , 0.53333336, 0.5294118 , 0.53333336, 0.5372549 ,
       0.53333336, 0.54509807, 0.54509807, 0.5411765 , 0.54901963,
       0.5529412 , 0.5529412 , 0.54901963, 0.54509807, 0.5529412 ,
       0.54509807, 0.54901963, 0.54509807, 0.54509807, 0.54509807,
       0.5411765 , 0.54509807, 0.54901963, 0.5529412 , 0.54509807,
       0.54509807, 0.54901963, 0.5411765 , 0.54509807, 0.54901963,
       0.54901963, 0.54901963, 0.54509807, 0.54509807, 0.54509807,
       0.5372549 , 0.5411765 , 0.5411765 , 0.5372549 , 0.5372549 ,
       0.5372549 , 0.5372549 , 0.5411765 , 0.5294118 , 0.53333336,
       0.5372549 , 0.54509807, 0.5372549 , 0.53333336, 0.5294118 ,
       0.52156866, 0.5254902 , 0.50980395, 0.5176471 , 0.5176471 ,
       0.5176471 , 0.5254902 , 0.52156866, 0.5176471 , 0.5254902 ,
       0.53333336, 0.53333336, 0.5176471 , 0.50980395, 0.5176471 ,
       0.50980395, 0.5176471 , 0.5137255 , 0.5137255 , 0.5137255 ,
       0.5058824 , 0.5019608 , 0.5137255 , 0.5058824 , 0.49803922,
       0.5058824 , 0.48235294, 0.48235294, 0.47058824, 0.48235294],
      dtype=float32)
In [5]:
print(type(img[50][0][0]))

In [6]:
# let's see the image
imgplot = plt.imshow(img)

Visualizing the channels

In [7]:
R_img = img[:,:,0]
G_img = img[:,:,1]
B_img = img[:,:,2]
plt.subplot(221)
plt.imshow(R_img, cmap=plt.cm.Reds)
plt.subplot(222)
plt.imshow(G_img, cmap=plt.cm.Greens)
plt.subplot(223)
plt.imshow(B_img, cmap=plt.cm.Blues)  
plt.subplot(224)
plt.imshow(img) 
plt.show()

More on preprocessing data below!

If you want to learn more: Image Processing with Python and Scipy

Part 3: Putting the Parts together to make a small ConvNet Model

Let's put all the parts together to make a convnet for classifying our good old MNIST digits.

In [8]:
# Load data and preprocess
(train_images, train_labels), (test_images, test_labels) = mnist.load_data() # load MNIST data
train_images.shape
Out[8]:
(60000, 28, 28)
In [9]:
train_images.max(), train_images.min()
Out[9]:
(255, 0)
In [10]:
train_images = train_images.reshape((60000, 28, 28, 1)) # Reshape to get third dimension
train_images = train_images.astype('float32') / 255 # Normalize between 0 and 1

test_images = test_images.reshape((10000, 28, 28, 1)) # Reshape to get third dimension
test_images = test_images.astype('float32') / 255 # Normalize between 0 and 1

# Convert labels to categorical data
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
In [11]:
mnist_cnn_model = models.Sequential() # Create sequential model

# Add network layers
mnist_cnn_model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
mnist_cnn_model.add(layers.MaxPooling2D((2, 2)))
mnist_cnn_model.add(layers.Conv2D(64, (3, 3), activation='relu')) 
mnist_cnn_model.add(layers.MaxPooling2D((2, 2)))
mnist_cnn_model.add(layers.Conv2D(64, (3, 3), activation='relu'))

The next step is to feed the last output tensor (of shape (3, 3, 64)) into a densely connected classifier network like those you’re already familiar with: a stack of Dense layers. These classifiers process vectors, which are 1D, whereas the current output is a 3D tensor. First we have to flatten the 3D outputs to 1D, and then add a few Dense layers on top.

In [12]:
mnist_cnn_model.add(layers.Flatten())
mnist_cnn_model.add(layers.Dense(64, activation='relu'))
mnist_cnn_model.add(layers.Dense(10, activation='softmax'))

mnist_cnn_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0
_________________________________________________________________
In [13]:
# Compile model
mnist_cnn_model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model
mnist_cnn_model.fit(train_images, train_labels, epochs=5, batch_size=64)

#  Evaluate the model on the test data:
test_loss, test_acc = mnist_cnn_model.evaluate(test_images, test_labels)
test_acc
Epoch 1/5
60000/60000 [==============================] - 21s 343us/step - loss: 0.1780 - acc: 0.9456
Epoch 2/5
60000/60000 [==============================] - 21s 352us/step - loss: 0.0479 - acc: 0.9854
Epoch 3/5
60000/60000 [==============================] - 25s 419us/step - loss: 0.0341 - acc: 0.9896
Epoch 4/5
60000/60000 [==============================] - 21s 349us/step - loss: 0.0254 - acc: 0.9922
Epoch 5/5
60000/60000 [==============================] - 21s 347us/step - loss: 0.0200 - acc: 0.9941
10000/10000 [==============================] - 1s 124us/step
Out[13]:
0.9902

A densely connected network (MLP) running MNIST usually has a test accuracy of 97.8%, whereas our basic convnet has a test accuracy of 99.03%: we decreased the error rate by 68% (relative) with only 5 epochs. Not bad! But why does this simple convnet work so well, compared to a densely connected model? The answer is above on how convolutional layers work!

Data Preprocessing : Meet the ImageDataGenerator class in keras (docs)

The MNIST and other pre-loaded dataset are formatted in a way that is almost ready for feeding into the model. What about plain images? They should be formatted into appropriately preprocessed floating-point tensors before being fed into the network.

The Dogs vs. Cats dataset that you’ll use isn’t packaged with Keras. It was made available by Kaggle as part of a computer-vision competition in late 2013, back when convnets weren’t mainstream. The data has been downloaded for you from https://www.kaggle.com/c/dogs-vs-cats/data The pictures are medium-resolution color JPEGs.

In [14]:
# TODO: set your base dir to your correct local location
base_dir = 'data/cats_and_dogs_small'

import os, shutil

# Set up directory information

train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
test_dir = os.path.join(base_dir, 'test')

train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

test_cats_dir = os.path.join(test_dir, 'cats')
test_dogs_dir = os.path.join(test_dir, 'dogs')

print('total training cat images:', len(os.listdir(train_cats_dir))) 
print('total training dog images:', len(os.listdir(train_dogs_dir))) 
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir))) 
print('total test dog images:', len(os.listdir(test_dogs_dir))) 
total training cat images: 1000
total training dog images: 1000
total validation cat images: 500
total validation dog images: 500
total test cat images: 500
total test dog images: 500

So you do indeed have 2,000 training images, 1,000 validation images, and 1,000 test images. Each split contains the same number of samples from each class: this is a balanced binary-classification problem, which means classification accuracy will be an appropriate measure of success.

Building the network

In [15]:
from keras import layers
from keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

For the compilation step, you’ll go with the RMSprop optimizer. Because you ended the network with a single sigmoid unit, you’ll use binary crossentropy as the loss.

In [16]:
from keras import optimizers
model.compile(loss='binary_crossentropy',
                      optimizer=optimizers.RMSprop(lr=1e-4),
                      metrics=['acc'])

The steps for getting it into the network are roughly as follows:

  1. Read the picture files.
  2. Decode the JPEG content to RGB grids of pixels.
  3. Convert these into floating-point tensors.
  4. Rescale the pixel values (between 0 and 255) to the [0, 1] interval (as you know, neural networks prefer to deal with small input values).

It may seem a bit daunting, but fortunately Keras has utilities to take care of these steps automatically with the class ImageDataGenerator, which lets you quickly set up Python generators that can automatically turn image files on disk into batches of preprocessed tensors. This is what you’ll use here.

In [17]:
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary')
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.

Let’s look at the output of one of these generators: it yields batches of 150×150 RGB images (shape (20, 150, 150, 3)) and binary labels (shape (20,)). There are 20 samples in each batch (the batch size). Note that the generator yields these batches indefinitely: it loops endlessly over the images in the target folder. For this reason, you need to break the iteration loop at some point:

In [18]:
for data_batch, labels_batch in train_generator:
    print('data batch shape:', data_batch.shape)
    print('labels batch shape:', labels_batch.shape)
    break
data batch shape: (20, 150, 150, 3)
labels batch shape: (20,)

Let’s fit the model to the data using the generator. You do so using the .fit_generator method, the equivalent of .fit for data generators like this one. It expects as its first argument a Python generator that will yield batches of inputs and targets indefinitely, like this one does.

Because the data is being generated endlessly, the Keras model needs to know how many samples to draw from the generator before declaring an epoch over. This is the role of the steps_per_epoch argument: after having drawn steps_per_epoch batches from the generator—that is, after having run for steps_per_epoch gradient descent steps - the fitting process will go to the next epoch. In this case, batches are 20 samples, so it will take 100 batches until you see your target of 2,000 samples.

When using fit_generator, you can pass a validation_data argument, much as with the fit method. It’s important to note that this argument is allowed to be a data generator, but it could also be a tuple of Numpy arrays. If you pass a generator as validation_data, then this generator is expected to yield batches of validation data endlessly; thus you should also specify the validation_steps argument, which tells the process how many batches to draw from the validation generator for evaluation

In [19]:
history = model.fit_generator(
      train_generator,
      steps_per_epoch=100,
      epochs=5, # TODO: should be 30
      validation_data=validation_generator,
      validation_steps=50)


# It’s good practice to always save your models after training.
model.save('cats_and_dogs_small_1.h5')
Epoch 1/5
100/100 [==============================] - 55s 549ms/step - loss: 0.6885 - acc: 0.5320 - val_loss: 0.6711 - val_acc: 0.6220
Epoch 2/5
100/100 [==============================] - 56s 558ms/step - loss: 0.6620 - acc: 0.5950 - val_loss: 0.6500 - val_acc: 0.6170
Epoch 3/5
100/100 [==============================] - 56s 562ms/step - loss: 0.6198 - acc: 0.6510 - val_loss: 0.6771 - val_acc: 0.5790
Epoch 4/5
100/100 [==============================] - 57s 567ms/step - loss: 0.5733 - acc: 0.6955 - val_loss: 0.5993 - val_acc: 0.6740
Epoch 5/5
100/100 [==============================] - 57s 566ms/step - loss: 0.5350 - acc: 0.7305 - val_loss: 0.6140 - val_acc: 0.6520

Let’s plot the loss and accuracy of the model over the training and validation data during training:

In [20]:
fig, ax = plt.subplots(1, 1, figsize=(10,6))
ax.plot((history.history['acc']), 'r', label='train')
ax.plot((history.history['val_acc']), 'b' ,label='val')
ax.set_xlabel(r'Epoch', fontsize=20)
ax.set_ylabel(r'Accuracy', fontsize=20)
ax.legend()
ax.tick_params(labelsize=20)

Let's try data augmentation

In [21]:
datagen = ImageDataGenerator(
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

These are just a few of the options available (for more, see the Keras documentation). Let’s quickly go over this code:

  • rotation_range is a value in degrees (0–180), a range within which to randomly rotate pictures.
  • width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally.
  • shear_range is for randomly applying shearing transformations.
  • zoom_range is for randomly zooming inside pictures.
  • horizontal_flip is for randomly flipping half the images horizontally—relevant when there are no assumptions of - horizontal asymmetry (for example, real-world pictures).
  • fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.

Let’s look at the augmented images

In [22]:
from keras.preprocessing import image
fnames = [os.path.join(train_dogs_dir, fname) for
     fname in os.listdir(train_dogs_dir)]
img_path = fnames[3] # Chooses one image to augment
img = image.load_img(img_path, target_size=(150, 150))
# Reads the image and resizes it
x = image.img_to_array(img) # Converts it to a Numpy array with shape (150, 150, 3) 
x = x.reshape((1,) + x.shape) # Reshapes it to (1, 150, 150, 3)
i=0
for batch in datagen.flow(x, batch_size=1):
    plt.figure(i)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break

plt.show()

If you train a new network using this data-augmentation configuration, the network will never see the same input twice. But the inputs it sees are still heavily intercorrelated, because they come from a small number of original images—you can’t produce new information, you can only remix existing information. As such, this may not be enough to completely get rid of overfitting. To further fight overfitting, you’ll also add a Dropout layer to your model right before the densely connected classifier.

In [23]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])
In [24]:
# Let’s train the network using data augmentation and dropout.

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,)
test_datagen = ImageDataGenerator(rescale=1./255)
# Note that the validation data shouldn’t be augmented!

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

history = model.fit_generator(
      train_generator,
      steps_per_epoch=100,
      epochs=5, # TODO: should be 100
      validation_data=validation_generator,
      validation_steps=50)

model.save('cats_and_dogs_small_2.h5')
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Epoch 1/5
100/100 [==============================] - 94s 935ms/step - loss: 0.6902 - acc: 0.5294 - val_loss: 0.7003 - val_acc: 0.4924
Epoch 2/5
100/100 [==============================] - 88s 882ms/step - loss: 0.6763 - acc: 0.5703 - val_loss: 0.7350 - val_acc: 0.5090
Epoch 3/5
100/100 [==============================] - 90s 899ms/step - loss: 0.6681 - acc: 0.5816 - val_loss: 0.6458 - val_acc: 0.6098
Epoch 4/5
100/100 [==============================] - 88s 877ms/step - loss: 0.6496 - acc: 0.6241 - val_loss: 0.6431 - val_acc: 0.6192
Epoch 5/5
100/100 [==============================] - 89s 886ms/step - loss: 0.6298 - acc: 0.6369 - val_loss: 0.5897 - val_acc: 0.6580

And let’s plot the results again. Thanks to data augmentation and dropout, you’re no longer overfitting: the training curves are closely tracking the validation curves. You now reach an accuracy of 82%, a 15% relative improvement over the non-regularized model. (Note: these numbers are for 100 epochs..)

In [25]:
fig, ax = plt.subplots(1, 1, figsize=(10,6))
ax.plot((history.history['acc']), 'r', label='train')
ax.plot((history.history['val_acc']), 'b' ,label='val')
ax.set_xlabel(r'Epoch', fontsize=20)
ax.set_ylabel(r'Accuracy', fontsize=20)
ax.legend()
ax.tick_params(labelsize=20)

By using regularization techniques even further, and by tuning the network’s parameters (such as the number of filters per convolution layer, or the number of layers in the network), you may be able to get an even better accuracy, likely up to 86% or 87%. But it would prove difficult to go any higher just by training your own convnet from scratch, because you have so little data to work with. As a next step to improve your accuracy on this problem, you’ll have to use a pretrained model.

Part 4: keras viz toolkit

In [10]:
class_idx = 0
indices = np.where(test_labels[:, class_idx] == 1.)[0]

# pick some random input from here.
idx = indices[0]

# Lets sanity check the picked image.
from matplotlib import pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (18, 6)

plt.imshow(test_images[idx][..., 0])
Out[10]:
In [11]:
input_shape=(28, 28, 1)
num_classes = 10
batch_size = 128
epochs = 5

model = Sequential()
model.add(layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation='softmax', name='preds'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

model.fit(train_images, train_labels,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(test_images, test_labels))

score = model.evaluate(test_images, test_labels, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 81s 1ms/step - loss: 0.2462 - acc: 0.9248 - val_loss: 0.0524 - val_acc: 0.9828
Epoch 2/5
60000/60000 [==============================] - 86s 1ms/step - loss: 0.0905 - acc: 0.9726 - val_loss: 0.0391 - val_acc: 0.9857
Epoch 3/5
60000/60000 [==============================] - 83s 1ms/step - loss: 0.0691 - acc: 0.9788 - val_loss: 0.0376 - val_acc: 0.9876
Epoch 4/5
60000/60000 [==============================] - 81s 1ms/step - loss: 0.0544 - acc: 0.9832 - val_loss: 0.0307 - val_acc: 0.9896
Epoch 5/5
60000/60000 [==============================] - 84s 1ms/step - loss: 0.0467 - acc: 0.9854 - val_loss: 0.0305 - val_acc: 0.9904
Test loss: 0.030515316201592212
Test accuracy: 0.9904
In [12]:
from vis.visualization import visualize_saliency
from vis.utils import utils
from keras import activations

# Utility to search for layer index by name. 
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'preds')
In [13]:
plt.rcParams["figure.figsize"] = (5,5)
from vis.visualization import visualize_cam
import warnings
warnings.filterwarnings('ignore')

# This corresponds to the Dense linear layer.
for class_idx in np.arange(10):    
    indices = np.where(test_labels[:, class_idx] == 1.)[0]
    idx = indices[0]

    f, ax = plt.subplots(1, 4)
    ax[0].imshow(test_images[idx][..., 0])
    
    for i, modifier in enumerate([None, 'guided', 'relu']):
        grads = visualize_cam(model, layer_idx, filter_indices=class_idx, 
                              seed_input=test_images[idx], backprop_modifier=modifier)        
        if modifier is None:
            modifier = 'vanilla'
        ax[i+1].set_title(modifier)    
        ax[i+1].imshow(grads, cmap='jet')

References and Acknowledgements

The cats and dogs part of this lab is based on the book Deep Learning with Python, Chapter 5 written by the Francois Chollet, the author of Keras. It is a very practical introduction to Deep Learning. It is appropriate for those with some Python knowledge who want to start with machine learning.

The saliency maps are from https://github.com/raghakot/keras-vis/blob/master/examples/mnist/attention.ipynb