CS-109A Introduction to Data Science

Lab 12: Building and Regularizing your first Neural Network

Harvard University
Fall 2019
Instructors: Pavlos Protopapas, Kevin Rader, Chris Tanner
Lab Instructors: Chris Tanner and Eleni Kaxiras.
Authors: Eleni Kaxiras, David Sondak, and Pavlos Protopapas.

In [ ]:
## RUN THIS CELL TO PROPERLY HIGHLIGHT THE EXERCISES
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
%matplotlib inline

from PIL import Image
In [ ]:
from __future__ import absolute_import, division, print_function, unicode_literals

# TensorFlow and tf.keras
import tensorflow as tf

tf.keras.backend.clear_session()  # For easy reset of notebook state.

print(tf.__version__)  # You should see a 2.0.0 here!

Picking up where we left off tf.keras with Tensorflow 2.0:

tf.keras.models.Sequential
tf.keras.layers.Dense, tf.keras.layers.Activation, 
tf.keras.layers.Dropout, tf.keras.layers.Flatten, tf.keras.layers.Reshape
tf.keras.optimizers.SGD
tf.keras.preprocessing.image.ImageDataGenerator
tf.keras.regularizers
tf.keras.datasets.mnist

Learning Goals

In this lab we will continue with the basics of feedforward neural networks, we will create one and explore various ways to optimize and regularize it using tf.keras, a deep learning library inside the broader framework called Tensorflow. By the end of this lab, you should:

  • Understand how a simple neural network works and code some of its functionality using tf.keras.
  • Think of vectors and arrays as tensors. Learn how to do basic image manipulations.
  • Implement a simple real world example using a neural network. Find ways to improve its performance.

Part 1: Motivation

In class discussion : why do we care about Neural Nets?

Buzzwords: Linearity, Interpretability, Performance

Part 2: Data Preparation

Tensors

We can think of tensors as multidimensional arrays of real numerical values; their job is to generalize matrices to multiple dimensions.

  • scalar = just a number = rank 0 tensor ($a$ ∈ $F$,)

  • vector = 1D array = rank 1 tensor ( $x = (\;x_1,...,x_i\;)⊤$ ∈ $F^n$ )

  • matrix = 2D array = rank 2 tensor ( $\textbf{X} = [a_{ij}] ∈ F^{m×n}$ )

  • 3D array = rank 3 tensor ( $\mathscr{X} =[t_{i,j,k}]∈F^{m×n×l}$ )

  • $\mathscr{N}$D array = rank $\mathscr{N}$ tensor ( $\mathscr{T} =[t_{i1},...,t_{i\mathscr{N}}]∈F^{n_1×...×n_\mathscr{N}}$ ) <-- Things start to get complicated here...

Tensor indexing

We can create subarrays by fixing some of the given tensor’s indices. We can create a vector by fixing all but one index. A 2D matrix is created when fixing all but two indices. For example, for a third order tensor the vectors are

$\mathscr{X}[:,j,k]$ = $\mathscr{X}[j,k]$ (column),
$\mathscr{X}[i,:,k]$ = $\mathscr{X}[i,k]$ (row), and
$\mathscr{X}[i,j,:]$ = $\mathscr{X}[i,j]$ (tube)

Tensor multiplication

We can multiply one matrix with another as long as the sizes are compatible ((n × m) × (m × p) = n × p), and also multiply an entire matrix by a constant. Numpy numpy.dot performs a matrix multiplication which is straightforward when we have 2D or 1D arrays. But what about > 3D arrays? The function will choose according to the matching dimentions but if we want to choose we should use tensordot, but, again, we do not need tensordot for this class.

Reese Witherspoon as a Rank 3 Tensor

A common kind of data input to a neural network is images. Images are nice to look at, but remember, the computer only sees a series of numbers arranged in tensors. In this part we will look at how images are displayed and altered in Python.

matplotlib supports only .png images but uses a library called Pillow to handle any image. If you do not have Pillow installed you can do this in anaconda:

conda install -c anaconda pillow 

OR 

pip install pillow

This image is from the dataset Labeled Faces in the Wild used for machine learning training. Images are 24-bit RGB images (height, width, channels) with 8 bits for each of R, G, B channel. Explore and print the array.

In [ ]:
import matplotlib.image as mpimg

# load and show the image
FILE = '../fig/Reese_Witherspoon.jpg'
img = mpimg.imread(FILE);
imgplot = plt.imshow(img);

print(f'The image is a: {type(img)} of shape {img.shape}')
img[3:5, 3:5, :]

Slicing tensors: slice along each axis

In [ ]:
# we want to show each color channel
fig, axes = plt.subplots(1, 3, figsize=(10,10))
for i, subplot in zip(range(3), axes):
    temp = np.zeros(img.shape, dtype='uint8')
    temp[:,:,i] = img[:,:,i]
    subplot.imshow(temp)
    subplot.set_axis_off()
plt.show()

Multiplying Images with a scalar

Just for fun, no real use for this lab!

In [ ]:
temp = img
temp = temp * 2
plt.imshow(temp)

For more on image manipulation by matplotlib see: matplotlib-images

Part 3: Building an Artificial Neural Network

https://www.tensorflow.org/guide/keras

tf.keras is TensorFlow's high-level API for building and training deep learning models. It's used for fast prototyping, state-of-the-art research, and production. Keras is a library created by François Chollet. After Google released Tensorflow 2.0, the creators of keras recommend that "Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features".

NOTE: In Keras everything starts with a Tensor of N samples as input and ends with a Tensor of N samples as output.

First you build it ...

Parts of a NN:

  • Part 1: the input layer (our dataset)

  • Part 2: the internal architecture or hidden layers (the number of layers, the activation functions, the learnable parameters and other hyperparameters)

  • Part 3: the output layer (what we want from the network - classification or regression)

... and then you train it!

  1. Load and pre-process the data
  2. Define the layers of the model.
  3. Compile the model.
  4. Fit the model to the train set (also using a validation set).
  5. Evaluate the model on the test set.
  6. We learn a lot by studying History! Plot metrics such as accuracy.
  7. Now let's use the Network for what it was meant to do: Predict on the test set!
  8. Try our model on a sandal from the Kanye West collection!
In [ ]:
# set the seed for reproducability of results
seed = 7
np.random.seed(seed)

Fashion MNIST

Fashion-MNIST is a dataset of clothing article images (created by Zalando), consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 x 28 grayscale image, associated with a label from 10 classes. The creators intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. Each pixel is 8 bits so its value ranges from 0 to 255.

Let's load and look at it!

1. Load and pre-process the data

In [ ]:
# get the data from keras - how convenient!
fashion_mnist = tf.keras.datasets.fashion_mnist

# load the data splitted in train and test! how nice!
(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()

# normalize the data by dividing with pixel intensity
# (each pixel is 8 bits so its value ranges from 0 to 255)
x_train, x_test = x_train / 255.0, x_test / 255.0

# classes are named 0-9 so define names for plotting clarity
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i]])
plt.show()
In [ ]:
# choose one image to look at
plt.imshow(x_train[3], cmap=plt.cm.binary)
In [ ]:
# take a look at the array shapes
x_train.shape, x_test.shape, y_train.shape

2. Define the layers of the model.

In [ ]:
# type your code here along with instructor

3. Compile the model

In [ ]:
# type your code here along with instructor
In [ ]:
# print a summary of your model
model.summary()
In [ ]:
# use this cool `tf.keras` method to visualize the layers of your network
tf.keras.utils.plot_model(
    model,
    #to_file='model.png', # if you want to save the image
    show_shapes=True, # True for more details than you need
    show_layer_names=True,
    rankdir='TB',
    expand_nested=False,
    dpi=96
)

4. Fit the model to the train set (also using a validation set)

This is the part that takes the longest in terms of time and where having GPUs helps.


ep·och
noun: epoch; plural noun: epochs. A period of time in history or a person's life, typically one marked by notable events or particular characteristics. Examples: "the Victorian epoch", "my Neural Netwok's epochs".


In [ ]:
%%time
# type your code here along with instructor

Save the model

You can save the model so you do not have .fit everytime you reset the kernel in the notebook. Network training is expensive!

For more details on this see https://www.tensorflow.org/guide/keras/save_and_serialize

In [ ]:
# save the model so you do not have to run the code everytime
model.save('fashion_model.h5')

# Recreate the exact same model purely from the file
#model = tf.keras.models.load_model('fashion_model.h5')

5. Evaluate the model on the test set.

In [ ]:
# type your code here along with instructor



# print results
print(f'Test accuracy={test_accuracy:.4f}')
if test_accuracy>0.8: print(f'Not bad!')

6. We learn a lot by studying History! Plot metrics such as accuracy.

You can learn a lot about neural networks by observing how they perform while training. You can issue kallbacks in keras. The networks's performance is stored in a keras callback aptly named history which can be plotted.

In [ ]:
print(history.history.keys())
In [ ]:
# plot accuracy and loss for the test set
fig, ax = plt.subplots(1,2, figsize=(20,6))

ax[0].plot(history.history['accuracy'])
ax[0].plot(history.history['val_accuracy'])
ax[0].set_title('Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend(['train', 'val'], loc='best')

ax[1].plot(history.history['loss'])
ax[1].plot(history.history['val_loss'])
ax[1].set_title('Model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[1].legend(['train', 'val'], loc='best')

7. Now let's use the Network for what it was meant to do: Predict on the test set!

In [ ]:
# type your code here along with instructor



# print results
print(f'These are the Network\'s predicted probabilities for each class for the first test image: \n{predictions[0]}')
print(f'Our Oracle says this is a class {np.argmax(predictions[0]):.2f}, which is a {class_names[np.argmax(predictions[0])]}')

Let's see if our network predicted right! Does this item really look like what was predicted?

In [ ]:
plt.figure()
plt.imshow(x_test[0], cmap=plt.cm.binary)
plt.xlabel(class_names[y_test[0]])
plt.colorbar()

Now let's see how confident our model is by plotting the probability values:

In [ ]:
# code source: https://www.tensorflow.org/tutorials/keras/classification
def plot_image(i, predictions_array, true_label, img):
    predictions_array, true_label, img = predictions_array, true_label[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
    predictions_array, true_label = predictions_array, true_label[i]
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')
In [ ]:
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], y_test, x_test)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  y_test)
plt.show()

8. Try our model on a sandal from the Kanye West collection!

Let's see if our network can generalize beyond the MNIST fashion dataset. Let's give it a trendy shoe and see what it predicts. Here is the image:

shoe

In class discussion : What kinds of images can our model predict?

Buzzword: Generalization

In [ ]:
# Let'see the tensor shape
shoe = np.array(Image.open('../fig/kanye_28.jpg'))
shoe.shape
In [ ]:
# We need to delete the other 2 channels and make the image B&W.; 
shoe = shoe[:,:,0]
shoe.shape
In [ ]:
plt.figure()
plt.imshow(shoe, cmap=plt.cm.binary)
plt.xlabel('a cool shoe')
plt.colorbar()

tf.keras models are optimized to make predictions on a batch, or collection, of examples at once. Accordingly, even though you're using a single image, you need to add it to a list:

In [ ]:
# Add the image to a batch where it's the only member.
shoe_batch = (np.expand_dims(shoe,0))
print(shoe_batch.shape)
In [ ]:
# write the code to predict here
In class discussion : How did our model perform?

Buzzword: Convolutional Neural Networks!

Let's now try a different boot:

In [ ]:
boot = np.array(Image.open('../fig/random_boot.png'))
plt.figure()
plt.imshow(boot, cmap=plt.cm.binary)
plt.xlabel('random boot from web')
plt.colorbar()
In [ ]:
# make into one channel
boot = boot[:,:,0]
boot.shape
In [ ]:
boots = (np.expand_dims(boot,0))
print(boot.shape)
In [ ]:
predictions_single = model.predict(boots)
print(predictions_single[0])
print(np.argmax(predictions_single[0]), class_names[np.argmax(predictions_single[0])])
In [ ]:
# if it's either a sneaker or a boot we are good
if np.argmax(predictions_single[0]) in [7,9]: print(f'We did better this time!')

Regularization

Let's try adding a regularizer in our model. For more see tf.keras regularizers.

  1. Norm penalties: kernel_regularizer= tf.keras.regularizers.l2(l=0.1)
  2. Early stopping via tf.keras.callbacks. Callbacks provide a way to interact with the model while it's training and inforce some decisions automatically. Callbacks need to be instantiated and are added to the .fit() function via the callbacks argument.
  3. Dropout
In [ ]:
# callbacks
# watch validation loss and be "patient" for 50 epochs of no improvement
#es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', verbose=1, patience=30)

model_regular = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(154, activation='relu', 
                        kernel_regularizer= tf.keras.regularizers.l2(l=0.1)),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(64, activation='relu', 
                       kernel_regularizer= tf.keras.regularizers.l2(l=0.1)),
  tf.keras.layers.Dense(10, activation='softmax')
])

# compile
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

model_regular.compile(optimizer=optimizer,
              loss=loss_fn,
              metrics=['accuracy'])
# fit
history_regular = model_regular.fit(x_train, y_train, validation_split=0.33, epochs=50, 
                    verbose=2) #, callbacks=[es])
                    
In [ ]:
test_loss, test_accuracy = model_regular.evaluate(x_test, y_test, verbose=0)
print(f'Test accuracy for regularized model={test_accuracy}')
In [ ]:
# plot accuracy and loss for the test set
fig, ax = plt.subplots(1,2, figsize=(20,6))

ax[0].plot(history_regular.history['accuracy'])
ax[0].plot(history_regular.history['val_accuracy'])
ax[0].set_title('Regularized Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend(['train', 'val'], loc='best')

ax[1].plot(history_regular.history['loss'])
ax[1].plot(history_regular.history['val_loss'])
ax[1].set_title('Regularized Model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[1].legend(['train', 'val'], loc='best')