Key Word(s): neural networks, feed forward, tensorflow, keras, training, batching, stochastic gradient descent

CS109A Introduction to Data Science

Standard Section 9: Feed Forward Neural Networks¶

Harvard University
Fall 2019
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner
Section Leaders: Marios Mattheakis, Abhimanyu (Abhi) Vasishth, Robbert (Rob) Struyven

In [ ]:

#RUN THIS CELL 
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)

The goal of this section is to be familiar with the most basic Artificial Neural Network architecture the Feed-Forward Neural Network (FFNN).

Specifically, we will:

Quick review the FFNN anatomy.
Design a simple FFNN from the scratch and fit simple toy-datasets.
Quantify the prediction (fit) by writing a loss function.
Write a function for the forward pass through an FFNN with a single hidden layer of arbitrary number of hidden neurons.
Use TensorFlow and Keras to design the previous architectures.
Use TensorFlow and Keras to train the network (find the optimal network parameters).

Import packages and check the version of your TensorFlow, it should be the version 2.0.0¶

In [ ]:

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# from pandas import DataFrame

import tensorflow as tf

In [ ]:

print(tf.__version__)

IMPORTANT: Unless you have the "TF version 2.0.0" try the following¶

pip install --upgrade pip

pip install tensorflow==2.0.0

conda install tensorflow=2.0.0

1. Review of the ANN anatomy¶

Input, Hidden, and Output layers¶

The forward pass through an FFNN is a sequence of linear (affine) and nonlinear (activation) operations.

2. Design a Feed Forward neural network¶

Let's create a simple FFNN with one input, one linear neuron as output layer, and one hidden layer of arbitrary number of hidden neurons. Get familiar with the forward propagation.

Define a nonlinear function which will be used for activation.
Create an FFNN with one hidden neuron and get familiar with your activation function.
Load the toyDataSet_1.csv and try to fit.
Quantify the fitting by using a loss function.
Make a general function for the forward pass of an FFNN with one hidden layer of arbitrary number of hidden neurons. Always keep one input and one output. The output is a linear layer (affine transformation).
Load the toyDataSet_1.csv. Design an FFNN with one hidden neurons and fit.
Load the toyDataSet_2.csv. Design an FFNN with two hidden neurons and fit.

Define the activation function¶

Here, we use the Rectified Linear Unit (ReLU) function which is defined as $$g(x)=\max(0,x)$$

In [ ]:

def g(z: float) -> float :
    return  np.maximum(0, z)

# or 
# g = lambda z: np.maximum(0, z)

Build a ANN with one hidden neuron¶

In [ ]:

# input vector
x_train = np.linspace(-1,1,100)

# set the network parameters
w1 = 1
b1 = 0.0
w2  = 1
b2  = 0.0 

# affine operation
l1 = w1*x_train + b1

# activation
h = g(l1)

# output linear layer
y_train = w2*h+b2


plt.plot(x_train, y_train,'-b' )

Plot a few cases to understand¶

In [ ]:

plt.figure(figsize=[12,8])

plt.subplot(2,2,1)
w1,b1,w2,b2 = 1,0,1,0
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')
#

plt.subplot(2,2,2)
w1,b1,w2,b2 = 1, 0.5, 1,0
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')

#
plt.subplot(2,2,3)
w1,b1,w2,b2 = 1,0.5, 1, -0.5
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')

#
plt.subplot(2,2,4)
w1,b1,w2,b2 = 1, 0.5, 2, -.5
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')
plt.tight_layout()

Exercise: Fit the data¶

Load the toyDataSet_1.csv from the data directory. Fit the data with the above simple FFNN and plot your results.

In [ ]:

toySet_1 = pd.read_csv('../data/toyDataSet_1.csv')

x_train = toySet_1['x'].values.reshape(-1,1)
y_train = toySet_1['y'].values.reshape(-1,1)

plt.plot(x_train, y_train,'.r',label='data')

In [ ]:

# your code here


# set the network parameters


# write the network operations: affine-activation-affine

In [ ]:

# %load 'solutions/sol_1.py'

Plot the prediction and the ground truth¶

In [ ]:

plt.plot(x_train, y_train,'or',label='data')
plt.plot(x_train, y_model,'-b', label='FFNN' )
plt.legend()

Write the Loss function¶

Quantify the quality of the fitting by writing a loss function. Mean Square Error (MSE) is a good choice for regression tasks.

In [ ]:

def mseLoss(y_data, y_prediction):    
    return ((y_data - y_prediction)**2).mean()

In [ ]:

Loss = mseLoss(y_train,y_model)
print('MSE Loss = ', Loss)

Forward pass function¶

Write a function for the forward propagation through an FFNN with one input, one linear output neuron, and one hidden layers with arbitrary number of neurons.

General Scheme:

One input vector: $x$
Affine (linear) transformation with $w_{1},~b_{1}$ are the parameter vectors (or $w_{1i},~b_{1i}$): $$l_1 = \sum_i^\text{neurons} w_{1i}x+b_{1i} = w^T_1 x + b_1 = w_1 \cdot x + b_1 = W_1\cdot X$$
Activation (nonlinear transformation): $$h = g(l_1)$$
Linear Output layer with a vector for weights $w_o$ and a scalar for the bias $b_o$: $$y = w_o^T h+b_o = w_o \cdot h + b_o = W_o\cdot H$$

In [ ]:

def myFFNN(X, W1, Wo ):

#     input dimensions  = 1
#     output dimensions = 1
#     hidden layers = 1
#     hidden neurons is determined by the size of W1 or W0
#     W1 : parameters of first layer 
#     Wo : parameters of output layer
#     parameters :  weights and biases


    # Input Layer: 
    # add a constant column for the biases to the input vector X
    ones = np.ones((len(X),1))
    l1 = X
    l1 = np.append(l1, ones, axis=1)

    # hidden layer: Affine and activation
    a1 = np.dot(W1, l1.T)
    h1 = g(a1)    
    
    # Output layer (linear layer) (2 steps)
    # (a) Add a const column the h1 for the affine transformation
    ones = np.ones((len(X),1))    
    
    H= np.append(h1.T, ones,axis=1).T
    # (b) Affine
    a = np.dot(Wo,H)
    y_hat = a.T

    return y_hat

Use the previous parameters in our forward propagation function to fit the toyDataSet_1.csv. Poot the resuts and print the associate loss function¶

In [ ]:

w11 = 2
b11 = 0.0
w21  = 1
b21  = 0.5

# make the parameters matrices
# First layer
W1 = np.array([[w11,b11]])
# Output Layer (only one bias term)
Wo = np.array([[w21,b21]])

# run the model
y_model_1 = myFFNN(x_train, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train, y_train,'or',label='data')
plt.plot(x_train, y_model_1,'-b', label='FFNN' )
plt.legend()

# quantify your prediction
Loss_1 = mseLoss(y_train,y_model_1)
print('MSE Loss = ', Loss_1)

Exercise: Fit a more complex dataset¶

Load the toyDataSet_2.csv from the data directory. Fit the data with your FFNN function and plot your results.

In [ ]:

toySet_2 = pd.read_csv('../data/toyDataSet_2.csv')

x_train2 = toySet_2['x'].values.reshape(-1,1)
y_train2 = toySet_2['y'].values.reshape(-1,1)

plt.plot(x_train2, y_train2,'.r',label='data')

Find the optimal parameters¶

In [ ]:

## your code here

w11 = 
b11 = 

w12 = 
b12 = 

w21 = 
w22 = 

b2  =

In [ ]:

# %load 'solutions/sol_2.py'

Run the model, plot and quantify the prediction¶

In [ ]:

# make the parameters matrices
# First Layer
W1 = np.array([[w11,b11], [w12,b12]])
# Output Layer (only one bias term)
Wo = np.array([[w21,w22, b2]])


# run the model
y_model2 = myFFNN(x_train2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train2, y_train2,'or',label='data')
plt.plot(x_train2, y_model2,'-b', label='FFNN' )
plt.legend()

# quantify your prediction
Loss_2 = mseLoss(y_train2,y_model2)
print('MSE Loss = ', Loss_2)

More complicated function¶

Explore more the functions that this simple network can fit by using more neurons. Essentially explore what function it can generate.

In [ ]:

# Two Neurons

w11 = -.8
b11 = -.1

w12 = .4
b12 = -.1

w21  = 1.3
w22  = -.8

b2  = 0.5

# First Layer
W1 = np.array([[w11,b11], [w12,b12]])
# Output Layer (only one bias term)
Wo = np.array([[w21,w22, b2]])


# run the model
y_model_p = myFFNN(x_train2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train2, y_model_p,'b', label='FFNN' )

In [ ]:

# Three Neurons
w11 = -.1
b11 = .3

w12 = .9
b12 = -.1

w13 = .7
b13 = -.2


w21  = -1.
w22  = -.7
w33  = .8

b2  = 0.25

# First Layer
W1 = np.array([[w11,b11], [w12,b12], [w13,b13]])
# Output Layer (only one bias term)
Wo = np.array([[w21,w22,w33, b2]])


# run the model
y_model_p = myFFNN(x_train2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train2, y_model_p,'b', label='FFNN' )

In [ ]:

# Random numbers between a,b
# (b-a) * np.random.random_sample((4, 4)) + a
a = -20
b = 20

# N neurons
N = 50

# Create random parameter matrices
W1 = (b-a) * np.random.random_sample((N, 2)) + a
Wo = (b-a) * np.random.random_sample((1, N+1)) + a

# make a bigger interval
x_train_p2 = np.linspace(-2,2,1000)
x_train_p2= x_train_p2.reshape(-1,1)

# run the model
y_model_p2 = myFFNN(x_train_p2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train_p2, y_model_p2,'b', label='FFNN' )

3. Tensor Flow and Keras¶

Keras, Sequential: [Source] (https://keras.io/models/sequential/)

There are many platforms that create the forward propagation (and not only) through more complex and deep architectures. They also provide the backpropagation function for training a network, that is, to find the optimal parameters for a specific task.

Here, we use Tensor Flow (TF) and Keras to employ FFNN for regression tasks.

Use Keras to fit the toyDataSet_2 dataset. Tune the weights manually.
Use TF to find the optimal parameters for the same dataset. Train the FFNN by using backpropagation.

Import packages from keras¶

In [ ]:

from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers

Read again the toyDataSet_2 and define again the weights in solution2¶

In [ ]:

toySet_2 = pd.read_csv('../data/toyDataSet_2.csv')

x_train2 = toySet_2['x'].values.reshape(-1,1)
y_train2 = toySet_2['y'].values.reshape(-1,1)

# plt.plot(x_train2, y_train2,'.r',label='data')

In [ ]:

# %load 'solutions/sol_2.py'
w11 = 1
b11 = .25

w12 = -1
b12 = .25

w21  = 1
w22  = 1

b2  = -0.5

Using TensorFlow and Keras make a FFNN to fit the toyDataSet_2. Use the same architecture and weights as previously¶

In [ ]:

model = models.Sequential(name='My_two_neurons_model_fixedWeights')

# hidden layer with 2 neurons (or nodes)
model.add(layers.Dense(2, activation='relu', input_shape=(1,)))

# output layer, one neuron 
model.add(layers.Dense(1,  activation='linear'))

model.summary()

Manually set the parameters¶

In [ ]:

# get weights 
weights = model.get_weights()
print('Initial values of the parameters')
print(weights)

# hidden layer
weights[0][0]=np.array([ w11, w12]) #weights 
weights[1]=np.array([b11, b12]) # biases
# output layer 
weights[2]=np.array([[w21],[w22]]) # weights
weights[3] = np.array([b2])    # bias

model.set_weights(weights)

print('\nAfter setting the parameters')
print(weights)

Visualize and quantify the prediction¶

In [ ]:

y_model_tf1 = model.predict(x_train2)

# plot the prediction and the ground truth
plt.plot(x_train2, y_train2,'or',label='data')
plt.plot(x_train2, y_model_tf1,'-b', label='FFNN' )
plt.legend()

# quantify your prediction
Loss_tf1 = mseLoss(y_train2, y_model_tf1)
print('MSE Loss = ', Loss_tf1)

Train the network¶

Let TensorFlow to find the optimal weights¶

Back propagation¶

The backward pass is the training. It is based on chain rule and updates the parameters. The optimization is done by minimizing the loss function.

Baching, stochastic gradient descent, and epochs¶

Shufle and split the entire dataset in mini-batches help to escape from local minima

In [ ]:

model_t = models.Sequential(name='My_two_neurons_model_training')

# hidden layer with 2 neurons (or nodes)
model_t.add(layers.Dense(2, activation='relu', input_shape=(1,)))

# output layer, one neuron 
model_t.add(layers.Dense(1,  activation='linear'))

# model_t.summary()

In [ ]:

sgd = optimizers.SGD(lr=0.01)
model_t.compile(loss='MSE',optimizer=sgd) 
history = model_t.fit(x_train2, y_train2, epochs=100, batch_size=16, verbose=0)

Plot training & validation loss values¶

In [ ]:

plt.plot(history.history['loss'],'b')
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

Visualize the prediction¶

In [ ]:

y_model_t = model_t.predict(x_train2)

# plot the prediction and the ground truth
plt.plot(x_train2, y_train2,'.r',label='data')
plt.plot(x_train2, y_model_t,'b', label='FFNN' )
plt.legend()

# quantify your prediction
Loss_t = mseLoss(y_train2, y_model_t)
print('MSE Loss = ', Loss_t)

Check the parameters¶

In [ ]:

weights_t = model_t.get_weights()

print("TF weights:\n", weights_t)
print()
print("my weights:\n", weights)

Add more neurons and also take a look during the training¶

In [ ]:

model = models.Sequential(name='My_two_neurons_model_training')
# hidden layer with 2 neurons (or nodes)
model.add(layers.Dense(10, activation='relu', input_shape=(1,)))
# output layer, one neuron 
model.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01)
model.compile(loss='MSE',optimizer=sgd) 

def plotting(model_t, ax, title):
    y_model_t = model_t.predict(x_train2)
    # quantify your prediction
    # plot the prediction and the ground truth
    ax.plot(x_train2, y_train2,'.r',label='data')
    ax.plot(x_train2, y_model_t,'b', label='FFNN' )
    ax.legend()
    loss = mseLoss(y_train2, y_model_t)
    ax.set_title(title + ' - MSE Loss = ' + str(np.round(loss,5)))

f, ax = plt.subplots(2,3, figsize=(16,7.5))
ax = ax.ravel()
plotting(model, ax[0], 'Epoch 0')
for i in range(5):
    model.fit(x_train2, y_train2, epochs=100, batch_size=32, verbose=0)
    plotting(model, ax[i+1], 'Epoch '+str(100*(i+1)))

In [ ]:

model.summary()

Let's fit something very nonlinear¶

In [ ]:

toySet_3 = pd.read_csv('../data/toyDataSet_3.csv')

x_train3 = toySet_3['x'].values.reshape(-1,1)
y_train3 = toySet_3['y'].values.reshape(-1,1)

In [ ]:

plt.plot(x_train3, y_train3,'or')

Desing now a FFNN with two-hidden layers with tanh activation function

In [ ]:

model = models.Sequential(name='MyNet')

# hidden layer with 20 neurons (or nodes)
model.add(layers.Dense(20, activation='tanh', input_shape=(1,)))
# Add another hidden layer of 20 neurons
# hidden layer with 20 neurons (or nodes)
model.add(layers.Dense(20, activation='tanh'))
# output layer, one neuron 
model.add(layers.Dense(1,  activation='linear'))

# model.summary()

In [ ]:

sgd = optimizers.SGD(lr=0.02)
model.compile(loss='MSE',optimizer=sgd) 
history = model.fit(x_train3, y_train3, epochs=1000, batch_size=16, verbose=0)

In [ ]:

# Log-scale is helpful since the loss decays fast
plt.loglog(history.history['loss'],'b')
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

In [ ]:

y_model = model.predict(x_train3)

# plot the prediction and the ground truth
plt.plot(x_train3, y_train3,'.r',label='data')
plt.plot(x_train3, y_model,'b', label='FFNN' )
plt.legend()

# quantify your prediction
Loss_t = mseLoss(y_train3, y_model)
print('MSE Loss = ', Loss_t)

End of Section¶

Info for the Tensor Flow 2 and Keras¶

Instructions for running `tf.keras` with Tensorflow 2.0:¶

Create a conda virtual environment by cloning an existing one that you know works
```
conda create --name myclone --clone myenv
```
Go to https://www.tensorflow.org/install/pip and follow instructions for your machine.

All references to Keras should be written as tf.keras. For example:

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

tf.keras.models.Sequential
tf.keras.layers.Dense, tf.keras.layers.Activation, 
tf.keras.layers.Dropout, tf.keras.layers.Flatten, tf.keras.layers.Reshape
tf.keras.optimizers.SGD
tf.keras.preprocessing.image.ImageDataGenerator
tf.keras.regularizers
tf.keras.datasets.mnist

You could avoid the long names by using

from tensorflow import keras
from tensorflow.keras import layers

These imports do not work on some systems, however, because they pick up previous versions of keras and tensorflow. That is why I avoid them in this lab.

In [ ]:

CS109A Introduction to Data Science

Standard Section 9: Feed Forward Neural Networks¶

Import packages and check the version of your TensorFlow, it should be the version 2.0.0¶

IMPORTANT: Unless you have the "TF version 2.0.0" try the following¶

1. Review of the ANN anatomy¶

Input, Hidden, and Output layers¶

2. Design a Feed Forward neural network¶

Define the activation function¶

Build a ANN with one hidden neuron¶

Plot a few cases to understand¶

Exercise: Fit the data¶

Plot the prediction and the ground truth¶

Write the Loss function¶

Forward pass function¶

Use the previous parameters in our forward propagation function to fit the toyDataSet_1.csv. Poot the resuts and print the associate loss function¶

Exercise: Fit a more complex dataset¶

Find the optimal parameters¶

Run the model, plot and quantify the prediction¶

More complicated function¶

3. Tensor Flow and Keras¶

Import packages from keras¶

Read again the toyDataSet_2 and define again the weights in solution2¶

Using TensorFlow and Keras make a FFNN to fit the toyDataSet_2. Use the same architecture and weights as previously¶

Manually set the parameters¶

Visualize and quantify the prediction¶

Train the network¶

Let TensorFlow to find the optimal weights¶

Back propagation¶

Baching, stochastic gradient descent, and epochs¶

Plot training & validation loss values¶

Visualize the prediction¶

Check the parameters¶

Add more neurons and also take a look during the training¶

Let's fit something very nonlinear¶

End of Section¶

Info for the Tensor Flow 2 and Keras¶

Instructions for running tf.keras with Tensorflow 2.0:¶

Instructions for running `tf.keras` with Tensorflow 2.0:¶