Key Word(s): feed forward neural networks, neural networks, tensorflow, keras, regularization, dropout

# CS-109A Introduction to Data Science

## Lab 12: Building and Regularizing your first Neural Network¶

**Harvard University**

**Fall 2019**

**Instructors:** Pavlos Protopapas, Kevin Rader, Chris Tanner

**Lab Instructors:** Chris Tanner and Eleni Kaxiras.

**Authors:** Eleni Kaxiras, David Sondak, and Pavlos Protopapas.

```
## RUN THIS CELL TO PROPERLY HIGHLIGHT THE EXERCISES
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)
```

```
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
%matplotlib inline
from PIL import Image
```

```
from __future__ import absolute_import, division, print_function, unicode_literals
# TensorFlow and tf.keras
import tensorflow as tf
tf.keras.backend.clear_session() # For easy reset of notebook state.
print(tf.__version__) # You should see a 2.0.0 here!
```

#### Picking up where we left off `tf.keras`

with Tensorflow 2.0:¶

```
tf.keras.models.Sequential
tf.keras.layers.Dense, tf.keras.layers.Activation,
tf.keras.layers.Dropout, tf.keras.layers.Flatten, tf.keras.layers.Reshape
tf.keras.optimizers.SGD
tf.keras.preprocessing.image.ImageDataGenerator
tf.keras.regularizers
tf.keras.datasets.mnist
```

## Learning Goals¶

In this lab we will continue with the basics of feedforward neural networks, we will create one and explore various ways to optimize and regularize it using `tf.keras`

, a deep learning library inside the broader framework called Tensorflow. By the end of this lab, you should:

- Understand how a simple neural network works and code some of its functionality using
`tf.keras`

. - Think of vectors and arrays as tensors. Learn how to do basic image manipulations.
- Implement a simple real world example using a neural network. Find ways to improve its performance.

## Part 1: Motivation¶

**In class discussion : why do we care about Neural Nets?**

**Buzzwords**: Linearity, Interpretability, Performance

## Part 2: Data Preparation¶

### Tensors¶

We can think of tensors as multidimensional arrays of real numerical values; their job is to generalize matrices to multiple dimensions.

**scalar**= just a number = rank 0 tensor ($a$ ∈ $F$,)**vector**= 1D array = rank 1 tensor ( $x = (\;x_1,...,x_i\;)⊤$ ∈ $F^n$ )**matrix**= 2D array = rank 2 tensor ( $\textbf{X} = [a_{ij}] ∈ F^{m×n}$ )**3D array**= rank 3 tensor ( $\mathscr{X} =[t_{i,j,k}]∈F^{m×n×l}$ )**$\mathscr{N}$D array**= rank $\mathscr{N}$ tensor ( $\mathscr{T} =[t_{i1},...,t_{i\mathscr{N}}]∈F^{n_1×...×n_\mathscr{N}}$ ) <--**Things start to get complicated here...**

#### Tensor indexing¶

We can create subarrays by fixing some of the given tensor’s indices. We can create a vector by fixing all but one index. A 2D matrix is created when fixing all but two indices. For example, for a third order tensor the vectors are

$\mathscr{X}[:,j,k]$ = $\mathscr{X}[j,k]$ (column),

$\mathscr{X}[i,:,k]$ = $\mathscr{X}[i,k]$ (row), and

$\mathscr{X}[i,j,:]$ = $\mathscr{X}[i,j]$ (tube)

#### Tensor multiplication¶

We can multiply one matrix with another as long as the sizes are compatible ((n × m) × (m × p) = n × p), and also multiply an entire matrix by a constant. Numpy `numpy.dot`

performs a matrix multiplication which is straightforward when we have 2D or 1D arrays. But what about > 3D arrays? The function will choose according to the matching dimentions but if we want to choose we should use `tensordot`

, but, again, we **do not need tensordot** for this class.

### Reese Witherspoon as a Rank 3 Tensor¶

A common kind of data input to a neural network is images. Images are nice to look at, but remember, the computer only sees a series of numbers arranged in `tensors`

. In this part we will look at how images are displayed and altered in Python.

`matplotlib`

supports only .png images but uses a library called `Pillow`

to handle any image. If you do not have `Pillow`

installed you can do this in anaconda:

```
conda install -c anaconda pillow
OR
pip install pillow
```

This image is from the dataset Labeled Faces in the Wild used for machine learning training. Images are 24-bit RGB images (height, width, channels) with 8 bits for each of R, G, B channel. Explore and print the array.

```
import matplotlib.image as mpimg
# load and show the image
FILE = '../fig/Reese_Witherspoon.jpg'
img = mpimg.imread(FILE);
imgplot = plt.imshow(img);
print(f'The image is a: {type(img)} of shape {img.shape}')
img[3:5, 3:5, :]
```

#### Slicing tensors: slice along each axis¶

```
# we want to show each color channel
fig, axes = plt.subplots(1, 3, figsize=(10,10))
for i, subplot in zip(range(3), axes):
temp = np.zeros(img.shape, dtype='uint8')
temp[:,:,i] = img[:,:,i]
subplot.imshow(temp)
subplot.set_axis_off()
plt.show()
```

#### Multiplying Images with a scalar¶

Just for fun, no real use for this lab!

```
temp = img
temp = temp * 2
plt.imshow(temp)
```

For more on image manipulation by `matplotlib`

see: matplotlib-images

## Part 3: Building an Artificial Neural Network¶

https://www.tensorflow.org/guide/keras

`tf.keras`

is TensorFlow's high-level API for building and training deep learning models. It's used for fast prototyping, state-of-the-art research, and production. `Keras`

is a library created by François Chollet. After Google released Tensorflow 2.0, the creators of `keras`

recommend that "Keras users who use multi-backend Keras with the TensorFlow backend switch to `tf.keras`

in TensorFlow 2.0. `tf.keras`

is better maintained and has better integration with TensorFlow features".

NOTE: In `Keras`

everything starts with a Tensor of N samples as input and ends with a Tensor of N samples as output.

### First you build it ...¶

Parts of a NN:

Part 1: the input layer (our dataset)

Part 2: the internal architecture or hidden layers (the number of layers, the activation functions, the learnable parameters and other hyperparameters)

- Part 3: the output layer (what we want from the network - classification or regression)

### ... and then you train it!¶

- Load and pre-process the data
- Define the layers of the model.
- Compile the model.
- Fit the model to the train set (also using a validation set).
- Evaluate the model on the test set.
- We learn a lot by studying History! Plot metrics such as accuracy.
- Now let's use the Network for what it was meant to do: Predict on the test set!
- Try our model on a sandal from the Kanye West collection!

```
# set the seed for reproducability of results
seed = 7
np.random.seed(seed)
```

### Fashion MNIST¶

**Fashion-MNIST** is a dataset of clothing article images (created by Zalando), consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a **28 x 28** grayscale image, associated with a label from **10 classes**. The creators intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. Each pixel is 8 bits so its value ranges from 0 to 255.

Let's load and look at it!

#### 1. Load and pre-process the data¶

```
# get the data from keras - how convenient!
fashion_mnist = tf.keras.datasets.fashion_mnist
# load the data splitted in train and test! how nice!
(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()
# normalize the data by dividing with pixel intensity
# (each pixel is 8 bits so its value ranges from 0 to 255)
x_train, x_test = x_train / 255.0, x_test / 255.0
# classes are named 0-9 so define names for plotting clarity
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i], cmap=plt.cm.binary)
plt.xlabel(class_names[y_train[i]])
plt.show()
```

```
# choose one image to look at
plt.imshow(x_train[3], cmap=plt.cm.binary)
```

```
# take a look at the array shapes
x_train.shape, x_test.shape, y_train.shape
```

#### 2. Define the layers of the model.¶

```
# type your code here along with instructor
```

#### 3. Compile the model¶

```
# type your code here along with instructor
```

```
# print a summary of your model
model.summary()
```

```
# use this cool `tf.keras` method to visualize the layers of your network
tf.keras.utils.plot_model(
model,
#to_file='model.png', # if you want to save the image
show_shapes=True, # True for more details than you need
show_layer_names=True,
rankdir='TB',
expand_nested=False,
dpi=96
)
```

#### 4. Fit the model to the train set (also using a validation set)¶

This is the part that takes the longest in terms of time and where having GPUs helps.

**ep·och**

noun: epoch; plural noun: epochs. A period of time in history or a person's life, typically one marked by notable events or particular characteristics. Examples: "the Victorian epoch", "my Neural Netwok's epochs".

```
%%time
# type your code here along with instructor
```

#### Save the model¶

You can save the model so you do not have `.fit`

everytime you reset the kernel in the notebook. Network training is expensive!

For more details on this see https://www.tensorflow.org/guide/keras/save_and_serialize

```
# save the model so you do not have to run the code everytime
model.save('fashion_model.h5')
# Recreate the exact same model purely from the file
#model = tf.keras.models.load_model('fashion_model.h5')
```

#### 5. Evaluate the model on the test set.¶

```
# type your code here along with instructor
# print results
print(f'Test accuracy={test_accuracy:.4f}')
if test_accuracy>0.8: print(f'Not bad!')
```

#### 6. We learn a lot by studying History! Plot metrics such as accuracy.¶

You can learn a lot about neural networks by observing how they perform while training. You can issue `kallbacks`

in `keras`

. The networks's performance is stored in a `keras`

callback aptly named `history`

which can be plotted.

```
print(history.history.keys())
```

```
# plot accuracy and loss for the test set
fig, ax = plt.subplots(1,2, figsize=(20,6))
ax[0].plot(history.history['accuracy'])
ax[0].plot(history.history['val_accuracy'])
ax[0].set_title('Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend(['train', 'val'], loc='best')
ax[1].plot(history.history['loss'])
ax[1].plot(history.history['val_loss'])
ax[1].set_title('Model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[1].legend(['train', 'val'], loc='best')
```

#### 7. Now let's use the Network for what it was meant to do: Predict on the test set!¶

```
# type your code here along with instructor
# print results
print(f'These are the Network\'s predicted probabilities for each class for the first test image: \n{predictions[0]}')
print(f'Our Oracle says this is a class {np.argmax(predictions[0]):.2f}, which is a {class_names[np.argmax(predictions[0])]}')
```

Let's see if our network predicted right! Does this item really look like what was predicted?

```
plt.figure()
plt.imshow(x_test[0], cmap=plt.cm.binary)
plt.xlabel(class_names[y_test[0]])
plt.colorbar()
```

Now let's see how confident our model is by plotting the probability values:

```
# code source: https://www.tensorflow.org/tutorials/keras/classification
def plot_image(i, predictions_array, true_label, img):
predictions_array, true_label, img = predictions_array, true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
def plot_value_array(i, predictions_array, true_label):
predictions_array, true_label = predictions_array, true_label[i]
plt.grid(False)
plt.xticks(range(10))
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
```

```
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], y_test, x_test)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i], y_test)
plt.show()
```

#### 8. Try our model on a sandal from the Kanye West collection!¶

Let's see if our network can generalize beyond the MNIST fashion dataset. Let's give it a trendy shoe and see what it predicts. Here is the image:

**In class discussion : What kinds of images can our model predict?**

**Buzzword**: Generalization

```
# Let'see the tensor shape
shoe = np.array(Image.open('../fig/kanye_28.jpg'))
shoe.shape
```

```
# We need to delete the other 2 channels and make the image B&W.;
shoe = shoe[:,:,0]
shoe.shape
```

```
plt.figure()
plt.imshow(shoe, cmap=plt.cm.binary)
plt.xlabel('a cool shoe')
plt.colorbar()
```

`tf.keras`

models are optimized to make predictions on a batch, or collection, of examples at once. Accordingly, even though you're using a single image, you need to add it to a list:

```
# Add the image to a batch where it's the only member.
shoe_batch = (np.expand_dims(shoe,0))
print(shoe_batch.shape)
```

```
# write the code to predict here
```

**In class discussion : How did our model perform?**

**Buzzword:** Convolutional Neural Networks!

Let's now try a different boot:

```
boot = np.array(Image.open('../fig/random_boot.png'))
plt.figure()
plt.imshow(boot, cmap=plt.cm.binary)
plt.xlabel('random boot from web')
plt.colorbar()
```

```
# make into one channel
boot = boot[:,:,0]
boot.shape
```

```
boots = (np.expand_dims(boot,0))
print(boot.shape)
```

```
predictions_single = model.predict(boots)
print(predictions_single[0])
print(np.argmax(predictions_single[0]), class_names[np.argmax(predictions_single[0])])
```

```
# if it's either a sneaker or a boot we are good
if np.argmax(predictions_single[0]) in [7,9]: print(f'We did better this time!')
```

### Regularization¶

Let's try adding a regularizer in our model. For more see `tf.keras`

regularizers.

- Norm penalties:
`kernel_regularizer= tf.keras.regularizers.l2(l=0.1)`

- Early stopping via
`tf.keras.callbacks`

. Callbacks provide a way to interact with the model while it's training and inforce some decisions automatically. Callbacks need to be instantiated and are added to the`.fit()`

function via the`callbacks`

argument. - Dropout

```
# callbacks
# watch validation loss and be "patient" for 50 epochs of no improvement
#es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', verbose=1, patience=30)
model_regular = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(154, activation='relu',
kernel_regularizer= tf.keras.regularizers.l2(l=0.1)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64, activation='relu',
kernel_regularizer= tf.keras.regularizers.l2(l=0.1)),
tf.keras.layers.Dense(10, activation='softmax')
])
# compile
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
model_regular.compile(optimizer=optimizer,
loss=loss_fn,
metrics=['accuracy'])
# fit
history_regular = model_regular.fit(x_train, y_train, validation_split=0.33, epochs=50,
verbose=2) #, callbacks=[es])
```

```
test_loss, test_accuracy = model_regular.evaluate(x_test, y_test, verbose=0)
print(f'Test accuracy for regularized model={test_accuracy}')
```

```
# plot accuracy and loss for the test set
fig, ax = plt.subplots(1,2, figsize=(20,6))
ax[0].plot(history_regular.history['accuracy'])
ax[0].plot(history_regular.history['val_accuracy'])
ax[0].set_title('Regularized Model accuracy')
ax[0].set_ylabel('accuracy')
ax[0].set_xlabel('epoch')
ax[0].legend(['train', 'val'], loc='best')
ax[1].plot(history_regular.history['loss'])
ax[1].plot(history_regular.history['val_loss'])
ax[1].set_title('Regularized Model loss')
ax[1].set_ylabel('loss')
ax[1].set_xlabel('epoch')
ax[1].legend(['train', 'val'], loc='best')
```