CS-109A Introduction to Data Science

Lab 12: Building and Regularizing your first Neural Network

Harvard University
Fall 2019
Instructors: Pavlos Protopapas, Kevin Rader, Chris Tanner
Lab Instructors: Chris Tanner and Eleni Kaxiras.
Authors: Eleni Kaxiras, David Sondak, and Pavlos Protopapas.

In [1]:
## RUN THIS CELL TO PROPERLY HIGHLIGHT THE EXERCISES
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)
Out[1]:
In [2]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
%matplotlib inline

from PIL import Image, ImageOps
In [3]:
from __future__ import absolute_import, division, print_function, unicode_literals

# TensorFlow and tf.keras
import tensorflow as tf

tf.keras.backend.clear_session()  # For easy reset of notebook state.

print(tf.__version__)  # You should see a 2.0.0 here!
2.0.0

Picking up where we left off tf.keras with Tensorflow 2.0:

tf.keras.models.Sequential
tf.keras.layers.Dense, tf.keras.layers.Activation, 
tf.keras.layers.Dropout, tf.keras.layers.Flatten, tf.keras.layers.Reshape
tf.keras.optimizers.SGD
tf.keras.preprocessing.image.ImageDataGenerator
tf.keras.regularizers
tf.keras.datasets.mnist

Learning Goals

In this lab we will continue with the basics of feedforward neural networks, we will create one and explore various ways to optimize and regularize it using tf.keras, a deep learning library inside the broader framework called Tensorflow. By the end of this lab, you should:

  • Understand how a simple neural network works and code some of its functionality using tf.keras.
  • Think of vectors and arrays as tensors. Learn how to do basic image manipulations.
  • Implement a simple real world example using a neural network. Find ways to improve its performance.

Part 1: Motivation

In class discussion : why do we care about Neural Nets?

Buzzwords: Linearity, Interpretability, Performance

Part 2: Data Preparation

Tensors

We can think of tensors as multidimensional arrays of real numerical values; their job is to generalize matrices to multiple dimensions.

  • scalar = just a number = rank 0 tensor ($a$ ∈ $F$,)

  • vector = 1D array = rank 1 tensor ( $x = (\;x_1,...,x_i\;)⊤$ ∈ $F^n$ )

  • matrix = 2D array = rank 2 tensor ( $\textbf{X} = [a_{ij}] ∈ F^{m×n}$ )

  • 3D array = rank 3 tensor ( $\mathscr{X} =[t_{i,j,k}]∈F^{m×n×l}$ )

  • $\mathscr{N}$D array = rank $\mathscr{N}$ tensor ( $\mathscr{T} =[t_{i1},...,t_{i\mathscr{N}}]∈F^{n_1×...×n_\mathscr{N}}$ ) <-- Things start to get complicated here...

Tensor indexing

We can create subarrays by fixing some of the given tensor’s indices. We can create a vector by fixing all but one index. A 2D matrix is created when fixing all but two indices. For example, for a third order tensor the vectors are

$\mathscr{X}[:,j,k]$ = $\mathscr{X}[j,k]$ (column),
$\mathscr{X}[i,:,k]$ = $\mathscr{X}[i,k]$ (row), and
$\mathscr{X}[i,j,:]$ = $\mathscr{X}[i,j]$ (tube)

Tensor multiplication

We can multiply one matrix with another as long as the sizes are compatible ((n × m) × (m × p) = n × p), and also multiply an entire matrix by a constant. Numpy numpy.dot performs a matrix multiplication which is straightforward when we have 2D or 1D arrays. But what about > 3D arrays? The function will choose according to the matching dimentions but if we want to choose we should use tensordot, but, again, we do not need tensordot for this class.

Reese Witherspoon as a Rank 3 Tensor

A common kind of data input to a neural network is images. Images are nice to look at, but remember, the computer only sees a series of numbers arranged in tensors. In this part we will look at how images are displayed and altered in Python.

matplotlib supports only .png images but uses a library called Pillow to handle any image. If you do not have Pillow installed you can do this in anaconda:

conda install -c anaconda pillow 

OR 

pip install pillow

This image is from the dataset Labeled Faces in the Wild used for machine learning training. Images are 24-bit RGB images (height, width, channels) with 8 bits for each of R, G, B channel. Explore and print the array.

In [4]:
import matplotlib.image as mpimg

# load and show the image
FILE = '../fig/Reese_Witherspoon.jpg'
img = mpimg.imread(FILE);
imgplot = plt.imshow(img);

print(f'The image is a: {type(img)} of shape {img.shape}')
img[3:5, 3:5, :]
The image is a:  of shape (150, 150, 3)
Out[4]:
array([[[241, 241, 241],
        [242, 242, 242]],

       [[241, 241, 241],
        [242, 242, 242]]], dtype=uint8)

Slicing tensors: slice along each axis

In [5]:
# we want to show each color channel
fig, axes = plt.subplots(1, 3, figsize=(10,10))
for i, subplot in zip(range(3), axes):
    temp = np.zeros(img.shape, dtype='uint8')
    temp[:,:,i] = img[:,:,i]
    subplot.imshow(temp)
    subplot.set_axis_off()
plt.show()

Multiplying Images with a scalar

Just for fun, no real use for this lab!

In [6]:
temp = img
temp = temp * 2
plt.imshow(temp)
Out[6]:

For more on image manipulation by matplotlib see: matplotlib-images

Part 3: Building an Artificial Neural Network

https://www.tensorflow.org/guide/keras

tf.keras is TensorFlow's high-level API for building and training deep learning models. It's used for fast prototyping, state-of-the-art research, and production. Keras is a library created by François Chollet. After Google released Tensorflow 2.0, the creators of keras recommend that "Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features".

NOTE: In Keras everything starts with a Tensor of N samples as input and ends with a Tensor of N samples as output.

First you build it ...

Parts of a NN:

  • Part 1: the input layer (our dataset)

  • Part 2: the internal architecture or hidden layers (the number of layers, the activation functions, the learnable parameters and other hyperparameters)

  • Part 3: the output layer (what we want from the network - classification or regression)

... and then you train it!

  1. Load and pre-process the data
  2. Define the layers of the model.
  3. Compile the model.
  4. Fit the model to the train set (also using a validation set).
  5. Evaluate the model on the test set.
  6. We learn a lot by studying History! Plot metrics such as accuracy.
  7. Now let's use the Network for what it was meant to do: Predict on the test set!
  8. Try our model on a sandal from the Kanye West collection!
In [7]:
# set the seed for reproducability of results
seed = 7
np.random.seed(seed)

Fashion MNIST

Fashion-MNIST is a dataset of clothing article images (created by Zalando), consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 x 28 grayscale image, associated with a label from 10 classes. The creators intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. Each pixel is 8 bits so its value ranges from 0 to 255.

Let's load and look at it!

1. Load and pre-process the data

In [8]:
# get the data from keras - how convenient!
fashion_mnist = tf.keras.datasets.fashion_mnist

# load the data splitted in train and test! how nice!
(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()

# normalize the data by dividing with pixel intensity
# (each pixel is 8 bits so its value ranges from 0 to 255)
x_train, x_test = x_train / 255.0, x_test / 255.0

# classes are named 0-9 so define names for plotting clarity
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i]])
plt.show()
In [9]:
# choose one image to look at
plt.imshow(x_train[3], cmap=plt.cm.binary)
Out[9]:
In [10]:
# take a look at the array shapes
x_train.shape, x_test.shape, y_train.shape
Out[10]:
((60000, 28, 28), (10000, 28, 28), (60000,))

2. Define the layers of the model.

In [11]:
# type your code here along with instructor
In [173]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(154, activation='relu'),
  tf.keras.layers.Dropout(0.3),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dropout(0.3),
  tf.keras.layers.Dense(10, activation='softmax')
])

3. Compile the model

In [174]:
# type your code here along with instructor
In [175]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,
              loss=loss_fn,
              metrics=['accuracy'])
In [176]:
# print a summary of your model
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_4 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 154)               120890    
_________________________________________________________________
dropout_4 (Dropout)          (None, 154)               0         
_________________________________________________________________
dense_13 (Dense)             (None, 64)                9920      
_________________________________________________________________
dropout_5 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 10)                650       
=================================================================
Total params: 131,460
Trainable params: 131,460
Non-trainable params: 0
_________________________________________________________________
In [177]:
# use this cool `tf.keras` method to visualize the layers of your network
tf.keras.utils.plot_model(
    model,
    #to_file='model.png', # if you want to save the image
    show_shapes=True, # True for more details than you need
    show_layer_names=True,
    rankdir='TB',
    expand_nested=False,
    dpi=96
)
Out[177]:

4. Fit the model to the train set (also using a validation set)

This is the part that takes the longest in terms of time and where having GPUs helps.


ep·och
noun: epoch; plural noun: epochs. A period of time in history or a person's life, typically one marked by notable events or particular characteristics. Examples: "the Victorian epoch", "my Neural Netwok's epochs".


In [178]:
# type your code here along with instructor
In [179]:
%%time
# the core of the network training
history = model.fit(x_train, y_train, validation_split=0.33, epochs=50, 
                    verbose=2)
Train on 40199 samples, validate on 19801 samples
Epoch 1/50
40199/40199 - 4s - loss: 0.6735 - accuracy: 0.7575 - val_loss: 0.4546 - val_accuracy: 0.8375
Epoch 2/50
40199/40199 - 4s - loss: 0.4866 - accuracy: 0.8241 - val_loss: 0.3951 - val_accuracy: 0.8557
Epoch 3/50
40199/40199 - 4s - loss: 0.4428 - accuracy: 0.8395 - val_loss: 0.3739 - val_accuracy: 0.8642
Epoch 4/50
40199/40199 - 5s - loss: 0.4175 - accuracy: 0.8511 - val_loss: 0.3722 - val_accuracy: 0.8597
Epoch 5/50
40199/40199 - 5s - loss: 0.4018 - accuracy: 0.8535 - val_loss: 0.3595 - val_accuracy: 0.8684
Epoch 6/50
40199/40199 - 4s - loss: 0.3845 - accuracy: 0.8616 - val_loss: 0.3909 - val_accuracy: 0.8491
Epoch 7/50
40199/40199 - 6s - loss: 0.3726 - accuracy: 0.8639 - val_loss: 0.3609 - val_accuracy: 0.8683
Epoch 8/50
40199/40199 - 4s - loss: 0.3656 - accuracy: 0.8669 - val_loss: 0.3398 - val_accuracy: 0.8727
Epoch 9/50
40199/40199 - 4s - loss: 0.3553 - accuracy: 0.8699 - val_loss: 0.3348 - val_accuracy: 0.8793
Epoch 10/50
40199/40199 - 5s - loss: 0.3455 - accuracy: 0.8734 - val_loss: 0.3335 - val_accuracy: 0.8789
Epoch 11/50
40199/40199 - 4s - loss: 0.3428 - accuracy: 0.8736 - val_loss: 0.3289 - val_accuracy: 0.8804
Epoch 12/50
40199/40199 - 6s - loss: 0.3340 - accuracy: 0.8776 - val_loss: 0.3328 - val_accuracy: 0.8804
Epoch 13/50
40199/40199 - 5s - loss: 0.3283 - accuracy: 0.8792 - val_loss: 0.3223 - val_accuracy: 0.8857
Epoch 14/50
40199/40199 - 5s - loss: 0.3231 - accuracy: 0.8800 - val_loss: 0.3196 - val_accuracy: 0.8819
Epoch 15/50
40199/40199 - 5s - loss: 0.3165 - accuracy: 0.8827 - val_loss: 0.3286 - val_accuracy: 0.8830
Epoch 16/50
40199/40199 - 4s - loss: 0.3167 - accuracy: 0.8824 - val_loss: 0.3166 - val_accuracy: 0.8867
Epoch 17/50
40199/40199 - 6s - loss: 0.3106 - accuracy: 0.8849 - val_loss: 0.3464 - val_accuracy: 0.8752
Epoch 18/50
40199/40199 - 4s - loss: 0.3049 - accuracy: 0.8856 - val_loss: 0.3169 - val_accuracy: 0.8876
Epoch 19/50
40199/40199 - 5s - loss: 0.3018 - accuracy: 0.8877 - val_loss: 0.3285 - val_accuracy: 0.8854
Epoch 20/50
40199/40199 - 5s - loss: 0.3014 - accuracy: 0.8900 - val_loss: 0.3283 - val_accuracy: 0.8858
Epoch 21/50
40199/40199 - 3s - loss: 0.2971 - accuracy: 0.8879 - val_loss: 0.3240 - val_accuracy: 0.8874
Epoch 22/50
40199/40199 - 3s - loss: 0.2953 - accuracy: 0.8900 - val_loss: 0.3331 - val_accuracy: 0.8848
Epoch 23/50
40199/40199 - 3s - loss: 0.2918 - accuracy: 0.8892 - val_loss: 0.3301 - val_accuracy: 0.8836
Epoch 24/50
40199/40199 - 3s - loss: 0.2865 - accuracy: 0.8924 - val_loss: 0.3260 - val_accuracy: 0.8875
Epoch 25/50
40199/40199 - 3s - loss: 0.2816 - accuracy: 0.8946 - val_loss: 0.3181 - val_accuracy: 0.8893
Epoch 26/50
40199/40199 - 3s - loss: 0.2789 - accuracy: 0.8959 - val_loss: 0.3138 - val_accuracy: 0.8895
Epoch 27/50
40199/40199 - 3s - loss: 0.2820 - accuracy: 0.8958 - val_loss: 0.3324 - val_accuracy: 0.8891
Epoch 28/50
40199/40199 - 3s - loss: 0.2796 - accuracy: 0.8963 - val_loss: 0.3289 - val_accuracy: 0.8917
Epoch 29/50
40199/40199 - 3s - loss: 0.2771 - accuracy: 0.8977 - val_loss: 0.3287 - val_accuracy: 0.8867
Epoch 30/50
40199/40199 - 3s - loss: 0.2708 - accuracy: 0.8979 - val_loss: 0.3302 - val_accuracy: 0.8877
Epoch 31/50
40199/40199 - 3s - loss: 0.2694 - accuracy: 0.8979 - val_loss: 0.3264 - val_accuracy: 0.8907
Epoch 32/50
40199/40199 - 3s - loss: 0.2695 - accuracy: 0.8995 - val_loss: 0.3165 - val_accuracy: 0.8921
Epoch 33/50
40199/40199 - 3s - loss: 0.2656 - accuracy: 0.9011 - val_loss: 0.3341 - val_accuracy: 0.8877
Epoch 34/50
40199/40199 - 3s - loss: 0.2633 - accuracy: 0.9007 - val_loss: 0.3259 - val_accuracy: 0.8870
Epoch 35/50
40199/40199 - 3s - loss: 0.2648 - accuracy: 0.9012 - val_loss: 0.3300 - val_accuracy: 0.8906
Epoch 36/50
40199/40199 - 3s - loss: 0.2626 - accuracy: 0.9011 - val_loss: 0.3298 - val_accuracy: 0.8922
Epoch 37/50
40199/40199 - 3s - loss: 0.2601 - accuracy: 0.9032 - val_loss: 0.3201 - val_accuracy: 0.8920
Epoch 38/50
40199/40199 - 3s - loss: 0.2553 - accuracy: 0.9054 - val_loss: 0.3226 - val_accuracy: 0.8911
Epoch 39/50
40199/40199 - 3s - loss: 0.2562 - accuracy: 0.9050 - val_loss: 0.3234 - val_accuracy: 0.8909
Epoch 40/50
40199/40199 - 3s - loss: 0.2532 - accuracy: 0.9061 - val_loss: 0.3258 - val_accuracy: 0.8921
Epoch 41/50
40199/40199 - 3s - loss: 0.2521 - accuracy: 0.9068 - val_loss: 0.3406 - val_accuracy: 0.8885
Epoch 42/50
40199/40199 - 3s - loss: 0.2498 - accuracy: 0.9065 - val_loss: 0.3323 - val_accuracy: 0.8930
Epoch 43/50
40199/40199 - 3s - loss: 0.2531 - accuracy: 0.9049 - val_loss: 0.3318 - val_accuracy: 0.8911
Epoch 44/50
40199/40199 - 3s - loss: 0.2465 - accuracy: 0.9088 - val_loss: 0.3234 - val_accuracy: 0.8906
Epoch 45/50
40199/40199 - 3s - loss: 0.2453 - accuracy: 0.9073 - val_loss: 0.3406 - val_accuracy: 0.8888
Epoch 46/50
40199/40199 - 3s - loss: 0.2452 - accuracy: 0.9093 - val_loss: 0.3407 - val_accuracy: 0.8892
Epoch 47/50
40199/40199 - 3s - loss: 0.2421 - accuracy: 0.9096 - val_loss: 0.3328 - val_accuracy: 0.8924
Epoch 48/50
40199/40199 - 3s - loss: 0.2454 - accuracy: 0.9074 - val_loss: 0.3413 - val_accuracy: 0.8896
Epoch 49/50
40199/40199 - 3s - loss: 0.2400 - accuracy: 0.9103 - val_loss: 0.3315 - val_accuracy: 0.8929
Epoch 50/50
40199/40199 - 3s - loss: 0.2407 - accuracy: 0.9095 - val_loss: 0.3336 - val_accuracy: 0.8925
CPU times: user 5min 56s, sys: 1min 44s, total: 7min 40s
Wall time: 3min 1s

Save the model

You can save the model so you do not have .fit everytime you reset the kernel in the notebook. Network training is expensive!

For more details on this see https://www.tensorflow.org/guide/keras/save_and_serialize

In [19]:
# save the model so you do not have to run the code everytime
model.save('fashion_model.h5')

# Recreate the exact same model purely from the file
#model = tf.keras.models.load_model('fashion_model.h5')

5. Evaluate the model on the test set.

In [20]:
# type your code here along with instructor
In [21]:
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f'Test accuracy={test_accuracy:.4f}')
if test_accuracy>0.8: print(f'Not bad!')
Test accuracy=0.8812
Not bad!

6. We learn a lot by studying History! Plot metrics such as accuracy.

You can learn a lot about neural networks by observing how they perform while training. You can issue kallbacks in keras. The networks's performance is stored in a keras callback aptly named history which can be plotted.

In [22]:
print(history.history.keys())
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [23]:
# plot accuracy and loss for the test set
fig, ax = plt.subplots(1,2, figsize=(20,6))

ax[0].plot(history.history['accuracy'])
ax[0].plot(history.history['val_accuracy'])
ax[0].set_title('Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend(['train', 'val'], loc='best')

ax[1].plot(history.history['loss'])
ax[1].plot(history.history['val_loss'])
ax[1].set_title('Model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[1].legend(['train', 'val'], loc='best')
Out[23]:

We notice that the model starts to overfil after ~10 epochs.

7. Now let's use the Network for what it was meant to do: Predict on the test set!

In [24]:
# type your code here along with instructor
In [25]:
predictions = model.predict(x_test)
print(f'These are the Network\'s predicted probabilities for each class for the first test image: \n{predictions[0]}')
print(f'Our Oracle says this is a class {np.argmax(predictions[0]):.2f}, which is a {class_names[np.argmax(predictions[0])]}')
These are the Network's predicted probabilities for each class for the first test image: 
[2.3815336e-17 2.2371180e-15 2.4300434e-17 7.7377087e-15 2.2632408e-20
 3.4222462e-06 8.0396281e-14 4.1196345e-05 9.2724052e-18 9.9995542e-01]
Our Oracle says this is a class 9.00, which is a Ankle boot

Let's see if our network predicted right! Does this item really look like what was predicted?

In [26]:
plt.figure()
plt.imshow(x_test[0], cmap=plt.cm.binary)
plt.xlabel(class_names[y_test[0]])
plt.colorbar()
Out[26]:

Now let's see how confident our model is by plotting the probability values:

In [27]:
# code source: https://www.tensorflow.org/tutorials/keras/classification
def plot_image(i, predictions_array, true_label, img):
    predictions_array, true_label, img = predictions_array, true_label[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
    predictions_array, true_label = predictions_array, true_label[i]
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')
In [28]:
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], y_test, x_test)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  y_test)
plt.show()
In [90]:
i = 38
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], y_test, x_test)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  y_test)
plt.show()

The model is very confident! It predicts and ankle boot with 100% probability.

8. Try our model on a sandal from the Kanye West collection!

Let's see if our network can generalize beyond the MNIST fashion dataset. Let's give it a trendy shoe and see what it predicts. Here is the image:

shoe

In class discussion : What kinds of images can our model predict?

Buzzword: Generalization

In [108]:
# Let'see the tensor shape
shoe = np.array(Image.open('../fig/kanye_28.jpg'))
shoe.shape
Out[108]:
(28, 28, 3)
In [109]:
# We need to delete the other 2 channels and make the image B&W.; 
shoe = shoe[:,:,0]
shoe.shape
Out[109]:
(28, 28)
In [110]:
plt.figure()
plt.imshow(shoe, cmap=plt.cm.binary)
plt.xlabel('a cool shoe')
plt.colorbar()
Out[110]:

tf.keras models are optimized to make predictions on a batch, or collection, of examples at once. Accordingly, even though you're using a single image, you need to add it to a list:

In [111]:
# Add the image to a batch where it's the only member.
shoe_batch = (np.expand_dims(shoe,0))
print(shoe_batch.shape)
(1, 28, 28)
In [147]:
predictions_single = model.predict(shoe_batch)
print(predictions_single[0])
print(np.argmax(predictions_single[0]), class_names[np.argmax(predictions_single[0])])
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
8 Bag

That was not classified right! Maybe it's because the colors are not the ones that the network is expecting. All images in our training set had a white background. Let's change it and see if we fair better.

In [148]:
shoe = np.ones(shoe.shape) * 255 - shoe
plt.figure()
plt.imshow(shoe, cmap=plt.cm.binary)
plt.xlabel('a cool shoe')
plt.colorbar()
Out[148]:

Let's try our model with this changed shoe

In [149]:
# Add the image to a batch where it's the only member.
shoe_batch = (np.expand_dims(shoe,0))
print(shoe_batch.shape)
(1, 28, 28)
In [150]:
predictions_single = model.predict(shoe_batch)
print(predictions_single[0])
print(np.argmax(predictions_single[0]), class_names[np.argmax(predictions_single[0])])
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
9 Ankle boot

Well, not exactly, but good enough, the best would be sandal; one student got that in her prediction.

In class discussion : How did our model perform?

Buzzword: Convolutional Neural Networks!

Let's now try a different boot:

In [151]:
boot = np.array(Image.open('../fig/random_boot.png'))
plt.figure()
plt.imshow(boot, cmap=plt.cm.binary)
plt.xlabel('random boot from web')
plt.colorbar()
Out[151]:
In [152]:
# make into one channel
boot = boot[:,:,0]
boot.shape
Out[152]:
(28, 28)
In [153]:
boots = (np.expand_dims(boot,0))
print(boot.shape)
(28, 28)
In [172]:
predictions_single = model.predict(boots)
print(predictions_single[0])
print(np.argmax(predictions_single[0]), class_names[np.argmax(predictions_single[0])])
[0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 2.7374996e-24 0.0000000e+00 1.0000000e+00 3.6208326e-32 9.4738127e-11]
7 Sneaker
In [157]:
# if it's either a sneaker or a boot we are good
if np.argmax(predictions_single[0]) in [7,9]: print(f'We did better this time!')
We did better this time!

Regularization

Let's try adding a regularizer in our model. For more see tf.keras regularizers.

  1. Norm penalties: kernel_regularizer= tf.keras.regularizers.l2(l=0.1)
  2. Early stopping via tf.keras.callbacks. Callbacks provide a way to interact with the model while it's training and inforce some decisions automatically. Callbacks need to be instantiated and are added to the .fit() function via the callbacks argument.
  3. Dropout
In [81]:
# callbacks
# watch validation loss and be "patient" for 50 epochs of no improvement
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', verbose=1, patience=10)

model_regular = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(154, activation='relu', 
                        kernel_regularizer= tf.keras.regularizers.l2(l=0.1)),
  tf.keras.layers.BatchNormalization(),
  tf.keras.layers.Dropout(0.4),
  tf.keras.layers.Dense(64, activation='relu', 
                       kernel_regularizer= tf.keras.regularizers.l2(l=0.1)),
  tf.keras.layers.BatchNormalization(),
  tf.keras.layers.Dense(10, activation='softmax')
])

# compile
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

model_regular.compile(optimizer=optimizer,
              loss=loss_fn,
              metrics=['accuracy'])
# fit
history_regular = model_regular.fit(x_train, y_train, validation_split=0.33, epochs=50, 
                    verbose=2) #, callbacks=[es])
                    
Train on 40199 samples, validate on 19801 samples
Epoch 1/50
40199/40199 - 5s - loss: 2.6520 - accuracy: 0.7177 - val_loss: 0.9317 - val_accuracy: 0.7763
Epoch 2/50
40199/40199 - 4s - loss: 1.0647 - accuracy: 0.7302 - val_loss: 1.4198 - val_accuracy: 0.6379
Epoch 3/50
40199/40199 - 4s - loss: 1.0388 - accuracy: 0.7340 - val_loss: 1.2246 - val_accuracy: 0.6660
Epoch 4/50
40199/40199 - 4s - loss: 1.0125 - accuracy: 0.7390 - val_loss: 0.9323 - val_accuracy: 0.7721
Epoch 5/50
40199/40199 - 4s - loss: 1.0084 - accuracy: 0.7401 - val_loss: 1.2886 - val_accuracy: 0.6326
Epoch 6/50
40199/40199 - 4s - loss: 1.0013 - accuracy: 0.7385 - val_loss: 1.1757 - val_accuracy: 0.6909
Epoch 7/50
40199/40199 - 4s - loss: 0.9864 - accuracy: 0.7453 - val_loss: 1.0294 - val_accuracy: 0.7437
Epoch 8/50
40199/40199 - 4s - loss: 0.9900 - accuracy: 0.7432 - val_loss: 0.9308 - val_accuracy: 0.7728
Epoch 9/50
40199/40199 - 4s - loss: 0.9877 - accuracy: 0.7418 - val_loss: 1.5178 - val_accuracy: 0.5823
Epoch 10/50
40199/40199 - 3s - loss: 1.0050 - accuracy: 0.7404 - val_loss: 1.1217 - val_accuracy: 0.7037
Epoch 11/50
40199/40199 - 4s - loss: 0.9929 - accuracy: 0.7413 - val_loss: 1.0637 - val_accuracy: 0.6975
Epoch 12/50
40199/40199 - 4s - loss: 0.9952 - accuracy: 0.7441 - val_loss: 1.0190 - val_accuracy: 0.6955
Epoch 13/50
40199/40199 - 3s - loss: 0.9690 - accuracy: 0.7490 - val_loss: 1.0681 - val_accuracy: 0.6973
Epoch 14/50
40199/40199 - 4s - loss: 0.9784 - accuracy: 0.7471 - val_loss: 0.9325 - val_accuracy: 0.7420
Epoch 15/50
40199/40199 - 4s - loss: 0.9663 - accuracy: 0.7455 - val_loss: 0.9807 - val_accuracy: 0.7505
Epoch 16/50
40199/40199 - 4s - loss: 0.9591 - accuracy: 0.7469 - val_loss: 1.0491 - val_accuracy: 0.7014
Epoch 17/50
40199/40199 - 4s - loss: 0.9707 - accuracy: 0.7461 - val_loss: 0.9409 - val_accuracy: 0.7291
Epoch 18/50
40199/40199 - 4s - loss: 0.9733 - accuracy: 0.7467 - val_loss: 1.0317 - val_accuracy: 0.7103
Epoch 19/50
40199/40199 - 4s - loss: 0.9743 - accuracy: 0.7451 - val_loss: 0.9768 - val_accuracy: 0.7537
Epoch 20/50
40199/40199 - 4s - loss: 0.9638 - accuracy: 0.7446 - val_loss: 1.1004 - val_accuracy: 0.6942
Epoch 21/50
40199/40199 - 4s - loss: 0.9686 - accuracy: 0.7440 - val_loss: 1.2571 - val_accuracy: 0.6854
Epoch 22/50
40199/40199 - 4s - loss: 0.9535 - accuracy: 0.7489 - val_loss: 0.9024 - val_accuracy: 0.7748
Epoch 23/50
40199/40199 - 4s - loss: 0.9529 - accuracy: 0.7490 - val_loss: 0.9575 - val_accuracy: 0.7711
Epoch 24/50
40199/40199 - 4s - loss: 0.9643 - accuracy: 0.7444 - val_loss: 0.9397 - val_accuracy: 0.7550
Epoch 25/50
40199/40199 - 4s - loss: 0.9661 - accuracy: 0.7471 - val_loss: 1.7217 - val_accuracy: 0.5002
Epoch 26/50
40199/40199 - 3s - loss: 0.9625 - accuracy: 0.7457 - val_loss: 1.0638 - val_accuracy: 0.7241
Epoch 27/50
40199/40199 - 4s - loss: 0.9605 - accuracy: 0.7429 - val_loss: 0.9840 - val_accuracy: 0.7463
Epoch 28/50
40199/40199 - 4s - loss: 0.9574 - accuracy: 0.7464 - val_loss: 0.9174 - val_accuracy: 0.7680
Epoch 29/50
40199/40199 - 4s - loss: 0.9442 - accuracy: 0.7509 - val_loss: 0.9042 - val_accuracy: 0.7563
Epoch 30/50
40199/40199 - 4s - loss: 0.9364 - accuracy: 0.7510 - val_loss: 0.9537 - val_accuracy: 0.7506
Epoch 31/50
40199/40199 - 4s - loss: 0.9459 - accuracy: 0.7507 - val_loss: 1.2036 - val_accuracy: 0.6440
Epoch 32/50
40199/40199 - 3s - loss: 0.9669 - accuracy: 0.7425 - val_loss: 0.8378 - val_accuracy: 0.7792
Epoch 33/50
40199/40199 - 4s - loss: 0.9670 - accuracy: 0.7424 - val_loss: 0.9400 - val_accuracy: 0.7351
Epoch 34/50
40199/40199 - 4s - loss: 0.9599 - accuracy: 0.7454 - val_loss: 1.1212 - val_accuracy: 0.6596
Epoch 35/50
40199/40199 - 4s - loss: 0.9547 - accuracy: 0.7468 - val_loss: 0.8319 - val_accuracy: 0.7840
Epoch 36/50
40199/40199 - 4s - loss: 0.9484 - accuracy: 0.7481 - val_loss: 0.9867 - val_accuracy: 0.6978
Epoch 37/50
40199/40199 - 4s - loss: 0.9706 - accuracy: 0.7417 - val_loss: 1.0093 - val_accuracy: 0.7357
Epoch 38/50
40199/40199 - 4s - loss: 0.9364 - accuracy: 0.7497 - val_loss: 0.8835 - val_accuracy: 0.7536
Epoch 39/50
40199/40199 - 4s - loss: 0.9294 - accuracy: 0.7500 - val_loss: 0.9051 - val_accuracy: 0.7595
Epoch 40/50
40199/40199 - 4s - loss: 0.9471 - accuracy: 0.7453 - val_loss: 1.0749 - val_accuracy: 0.7112
Epoch 41/50
40199/40199 - 4s - loss: 0.9329 - accuracy: 0.7490 - val_loss: 1.1296 - val_accuracy: 0.6971
Epoch 42/50
40199/40199 - 4s - loss: 0.9469 - accuracy: 0.7468 - val_loss: 0.8840 - val_accuracy: 0.7557
Epoch 43/50
40199/40199 - 4s - loss: 0.9208 - accuracy: 0.7519 - val_loss: 0.9376 - val_accuracy: 0.7614
Epoch 44/50
40199/40199 - 4s - loss: 0.9374 - accuracy: 0.7470 - val_loss: 0.8816 - val_accuracy: 0.7590
Epoch 45/50
40199/40199 - 4s - loss: 0.9442 - accuracy: 0.7456 - val_loss: 0.9593 - val_accuracy: 0.7444
Epoch 46/50
40199/40199 - 4s - loss: 0.9600 - accuracy: 0.7487 - val_loss: 0.8555 - val_accuracy: 0.7755
Epoch 47/50
40199/40199 - 4s - loss: 0.9480 - accuracy: 0.7448 - val_loss: 0.8525 - val_accuracy: 0.7757
Epoch 48/50
40199/40199 - 4s - loss: 0.9248 - accuracy: 0.7484 - val_loss: 0.9819 - val_accuracy: 0.7089
Epoch 49/50
40199/40199 - 3s - loss: 0.9479 - accuracy: 0.7435 - val_loss: 1.1471 - val_accuracy: 0.6752
Epoch 50/50
40199/40199 - 4s - loss: 0.9320 - accuracy: 0.7458 - val_loss: 1.0043 - val_accuracy: 0.7126
In [41]:
test_loss, test_accuracy = model_regular.evaluate(x_test, y_test, verbose=0)
print(f'Test accuracy for regularized model={test_accuracy}')
Test accuracy for regularized model=0.7746999859809875
In [42]:
# plot accuracy and loss for the test set
fig, ax = plt.subplots(1,2, figsize=(20,6))

ax[0].plot(history_regular.history['accuracy'])
ax[0].plot(history_regular.history['val_accuracy'])
ax[0].set_title('Regularized Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend(['train', 'val'], loc='best')

ax[1].plot(history_regular.history['loss'])
ax[1].plot(history_regular.history['val_loss'])
ax[1].set_title('Regularized Model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[1].legend(['train', 'val'], loc='best')
Out[42]:

Conclusion

We notice that Dropout helped our first model achive a 0.88 accuracy. In our second model which also used L2 regularization, we get a lower accuracy. There is no simple recipe for regularizing neural nets. They are all different. Different are also the tasks that each is called to solve.