# CS109A Introduction to Data Science

## Standard Section 9: Feed Forward Neural Networks¶

Harvard University
Fall 2019
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner
Section Leaders: Marios Mattheakis, Abhimanyu (Abhi) Vasishth, Robbert (Rob) Struyven

In [ ]:
#RUN THIS CELL
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)


The goal of this section is to be familiar with the most basic Artificial Neural Network architecture the Feed-Forward Neural Network (FFNN).

Specifically, we will:

1. Quick review the FFNN anatomy.
2. Design a simple FFNN from the scratch and fit simple toy-datasets.
3. Quantify the prediction (fit) by writing a loss function.
4. Write a function for the forward pass through an FFNN with a single hidden layer of arbitrary number of hidden neurons.
5. Use TensorFlow and Keras to design the previous architectures.
6. Use TensorFlow and Keras to train the network (find the optimal network parameters).

#### Import packages and check the version of your TensorFlow, it should be the version 2.0.0¶

In [ ]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# from pandas import DataFrame

import tensorflow as tf

In [ ]:
print(tf.__version__)


#### IMPORTANT: Unless you have the "TF version 2.0.0" try the following¶

pip install --upgrade pip

pip install tensorflow==2.0.0

OR

conda install tensorflow=2.0.0

## 1. Review of the ANN anatomy¶

#### Input, Hidden, and Output layers¶

The forward pass through an FFNN is a sequence of linear (affine) and nonlinear (activation) operations.

## 2. Design a Feed Forward neural network¶

Let's create a simple FFNN with one input, one linear neuron as output layer, and one hidden layer of arbitrary number of hidden neurons. Get familiar with the forward propagation.

• Define a nonlinear function which will be used for activation.
• Create an FFNN with one hidden neuron and get familiar with your activation function.
• Load the toyDataSet_1.csv and try to fit.
• Quantify the fitting by using a loss function.
• Make a general function for the forward pass of an FFNN with one hidden layer of arbitrary number of hidden neurons. Always keep one input and one output. The output is a linear layer (affine transformation).
• Load the toyDataSet_1.csv. Design an FFNN with one hidden neurons and fit.
• Load the toyDataSet_2.csv. Design an FFNN with two hidden neurons and fit.

### Define the activation function¶

Here, we use the Rectified Linear Unit (ReLU) function which is defined as $$g(x)=\max(0,x)$$

In [ ]:
def g(z: float) -> float :
return  np.maximum(0, z)

# or
# g = lambda z: np.maximum(0, z)


### Build a ANN with one hidden neuron¶

In [ ]:
# input vector
x_train = np.linspace(-1,1,100)

# set the network parameters
w1 = 1
b1 = 0.0
w2  = 1
b2  = 0.0

# affine operation
l1 = w1*x_train + b1

# activation
h = g(l1)

# output linear layer
y_train = w2*h+b2

plt.plot(x_train, y_train,'-b' )


#### Plot a few cases to understand¶

In [ ]:
plt.figure(figsize=[12,8])

plt.subplot(2,2,1)
w1,b1,w2,b2 = 1,0,1,0
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')
#

plt.subplot(2,2,2)
w1,b1,w2,b2 = 1, 0.5, 1,0
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')

#
plt.subplot(2,2,3)
w1,b1,w2,b2 = 1,0.5, 1, -0.5
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')

#
plt.subplot(2,2,4)
w1,b1,w2,b2 = 1, 0.5, 2, -.5
l1 = w1*x_train + b1
y_train = w2*g(l1)+b2
plt.plot(x_train,y_train,'b')
plt.ylim([-1,1])
plt.xlim([-1,1])
plt.title('w1, b1, w2, b2 = '+ str(w1) + ', ' + str(b1)+ ', '+ str(w2) + ', ' + str(b2))
plt.grid('on')
plt.tight_layout()


#### Exercise: Fit the data¶

Load the toyDataSet_1.csv from the data directory. Fit the data with the above simple FFNN and plot your results.

In [ ]:
toySet_1 = pd.read_csv('../data/toyDataSet_1.csv')

x_train = toySet_1['x'].values.reshape(-1,1)
y_train = toySet_1['y'].values.reshape(-1,1)

plt.plot(x_train, y_train,'.r',label='data')

In [ ]:
# your code here

# set the network parameters

# write the network operations: affine-activation-affine

In [ ]:
# %load 'solutions/sol_1.py'


#### Plot the prediction and the ground truth¶

In [ ]:
plt.plot(x_train, y_train,'or',label='data')
plt.plot(x_train, y_model,'-b', label='FFNN' )
plt.legend()


#### Write the Loss function¶

Quantify the quality of the fitting by writing a loss function. Mean Square Error (MSE) is a good choice for regression tasks.

In [ ]:
def mseLoss(y_data, y_prediction):
return ((y_data - y_prediction)**2).mean()


In [ ]:
Loss = mseLoss(y_train,y_model)
print('MSE Loss = ', Loss)


#### Forward pass function¶

Write a function for the forward propagation through an FFNN with one input, one linear output neuron, and one hidden layers with arbitrary number of neurons.

General Scheme:

• One input vector: $x$
• Affine (linear) transformation with $w_{1},~b_{1}$ are the parameter vectors (or $w_{1i},~b_{1i}$): $$l_1 = \sum_i^\text{neurons} w_{1i}x+b_{1i} = w^T_1 x + b_1 = w_1 \cdot x + b_1 = W_1\cdot X$$
• Activation (nonlinear transformation): $$h = g(l_1)$$
• Linear Output layer with a vector for weights $w_o$ and a scalar for the bias $b_o$: $$y = w_o^T h+b_o = w_o \cdot h + b_o = W_o\cdot H$$
In [ ]:
def myFFNN(X, W1, Wo ):

#     input dimensions  = 1
#     output dimensions = 1
#     hidden layers = 1
#     hidden neurons is determined by the size of W1 or W0
#     W1 : parameters of first layer
#     Wo : parameters of output layer
#     parameters :  weights and biases

# Input Layer:
# add a constant column for the biases to the input vector X
ones = np.ones((len(X),1))
l1 = X
l1 = np.append(l1, ones, axis=1)

# hidden layer: Affine and activation
a1 = np.dot(W1, l1.T)
h1 = g(a1)

# Output layer (linear layer) (2 steps)
# (a) Add a const column the h1 for the affine transformation
ones = np.ones((len(X),1))

H= np.append(h1.T, ones,axis=1).T
# (b) Affine
a = np.dot(Wo,H)
y_hat = a.T

return y_hat


#### Use the previous parameters in our forward propagation function to fit the toyDataSet_1.csv. Poot the resuts and print the associate loss function¶

In [ ]:
w11 = 2
b11 = 0.0
w21  = 1
b21  = 0.5

# make the parameters matrices
# First layer
W1 = np.array([[w11,b11]])
# Output Layer (only one bias term)
Wo = np.array([[w21,b21]])

# run the model
y_model_1 = myFFNN(x_train, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train, y_train,'or',label='data')
plt.plot(x_train, y_model_1,'-b', label='FFNN' )
plt.legend()

Loss_1 = mseLoss(y_train,y_model_1)
print('MSE Loss = ', Loss_1)


#### Exercise: Fit a more complex dataset¶

Load the toyDataSet_2.csv from the data directory. Fit the data with your FFNN function and plot your results.

In [ ]:
toySet_2 = pd.read_csv('../data/toyDataSet_2.csv')

x_train2 = toySet_2['x'].values.reshape(-1,1)
y_train2 = toySet_2['y'].values.reshape(-1,1)

plt.plot(x_train2, y_train2,'.r',label='data')


#### Find the optimal parameters¶

In [ ]:
## your code here

w11 =
b11 =

w12 =
b12 =

w21 =
w22 =

b2  =

In [ ]:
# %load 'solutions/sol_2.py'


#### Run the model, plot and quantify the prediction¶

In [ ]:
# make the parameters matrices
# First Layer
W1 = np.array([[w11,b11], [w12,b12]])
# Output Layer (only one bias term)
Wo = np.array([[w21,w22, b2]])

# run the model
y_model2 = myFFNN(x_train2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train2, y_train2,'or',label='data')
plt.plot(x_train2, y_model2,'-b', label='FFNN' )
plt.legend()

Loss_2 = mseLoss(y_train2,y_model2)
print('MSE Loss = ', Loss_2)


#### More complicated function¶

Explore more the functions that this simple network can fit by using more neurons. Essentially explore what function it can generate.

In [ ]:
# Two Neurons

w11 = -.8
b11 = -.1

w12 = .4
b12 = -.1

w21  = 1.3
w22  = -.8

b2  = 0.5

# First Layer
W1 = np.array([[w11,b11], [w12,b12]])
# Output Layer (only one bias term)
Wo = np.array([[w21,w22, b2]])

# run the model
y_model_p = myFFNN(x_train2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train2, y_model_p,'b', label='FFNN' )

In [ ]:
# Three Neurons
w11 = -.1
b11 = .3

w12 = .9
b12 = -.1

w13 = .7
b13 = -.2

w21  = -1.
w22  = -.7
w33  = .8

b2  = 0.25

# First Layer
W1 = np.array([[w11,b11], [w12,b12], [w13,b13]])
# Output Layer (only one bias term)
Wo = np.array([[w21,w22,w33, b2]])

# run the model
y_model_p = myFFNN(x_train2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train2, y_model_p,'b', label='FFNN' )

In [ ]:
# Random numbers between a,b
# (b-a) * np.random.random_sample((4, 4)) + a
a = -20
b = 20

# N neurons
N = 50

# Create random parameter matrices
W1 = (b-a) * np.random.random_sample((N, 2)) + a
Wo = (b-a) * np.random.random_sample((1, N+1)) + a

# make a bigger interval
x_train_p2 = np.linspace(-2,2,1000)
x_train_p2= x_train_p2.reshape(-1,1)

# run the model
y_model_p2 = myFFNN(x_train_p2, W1, Wo )

# plot the prediction and the ground truth
plt.plot(x_train_p2, y_model_p2,'b', label='FFNN' )


## 3. Tensor Flow and Keras¶

Keras, Sequential: [Source] (https://keras.io/models/sequential/)

There are many platforms that create the forward propagation (and not only) through more complex and deep architectures. They also provide the backpropagation function for training a network, that is, to find the optimal parameters for a specific task.

Here, we use Tensor Flow (TF) and Keras to employ FFNN for regression tasks.

• Use Keras to fit the toyDataSet_2 dataset. Tune the weights manually.
• Use TF to find the optimal parameters for the same dataset. Train the FFNN by using backpropagation.

#### Import packages from keras¶

In [ ]:
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers


#### Read again the toyDataSet_2 and define again the weights in solution2¶

In [ ]:
toySet_2 = pd.read_csv('../data/toyDataSet_2.csv')

x_train2 = toySet_2['x'].values.reshape(-1,1)
y_train2 = toySet_2['y'].values.reshape(-1,1)

# plt.plot(x_train2, y_train2,'.r',label='data')

In [ ]:
# %load 'solutions/sol_2.py'
w11 = 1
b11 = .25

w12 = -1
b12 = .25

w21  = 1
w22  = 1

b2  = -0.5


#### Using TensorFlow and Keras make a FFNN to fit the toyDataSet_2. Use the same architecture and weights as previously¶

In [ ]:
model = models.Sequential(name='My_two_neurons_model_fixedWeights')

# hidden layer with 2 neurons (or nodes)

# output layer, one neuron

model.summary()


#### Manually set the parameters¶

In [ ]:
# get weights
weights = model.get_weights()
print('Initial values of the parameters')
print(weights)

# hidden layer
weights[0][0]=np.array([ w11, w12]) #weights
weights[1]=np.array([b11, b12]) # biases
# output layer
weights[2]=np.array([[w21],[w22]]) # weights
weights[3] = np.array([b2])    # bias

model.set_weights(weights)

print('\nAfter setting the parameters')
print(weights)


#### Visualize and quantify the prediction¶

In [ ]:
y_model_tf1 = model.predict(x_train2)

# plot the prediction and the ground truth
plt.plot(x_train2, y_train2,'or',label='data')
plt.plot(x_train2, y_model_tf1,'-b', label='FFNN' )
plt.legend()

Loss_tf1 = mseLoss(y_train2, y_model_tf1)
print('MSE Loss = ', Loss_tf1)


### Train the network¶

#### Back propagation¶

The backward pass is the training. It is based on chain rule and updates the parameters. The optimization is done by minimizing the loss function.

#### Baching, stochastic gradient descent, and epochs¶

Shufle and split the entire dataset in mini-batches help to escape from local minima

In [ ]:
model_t = models.Sequential(name='My_two_neurons_model_training')

# hidden layer with 2 neurons (or nodes)

# output layer, one neuron

# model_t.summary()

In [ ]:
sgd = optimizers.SGD(lr=0.01)
model_t.compile(loss='MSE',optimizer=sgd)
history = model_t.fit(x_train2, y_train2, epochs=100, batch_size=16, verbose=0)


#### Plot training & validation loss values¶

In [ ]:
plt.plot(history.history['loss'],'b')
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')


#### Visualize the prediction¶

In [ ]:
y_model_t = model_t.predict(x_train2)

# plot the prediction and the ground truth
plt.plot(x_train2, y_train2,'.r',label='data')
plt.plot(x_train2, y_model_t,'b', label='FFNN' )
plt.legend()

Loss_t = mseLoss(y_train2, y_model_t)
print('MSE Loss = ', Loss_t)


#### Check the parameters¶

In [ ]:
weights_t = model_t.get_weights()

print("TF weights:\n", weights_t)
print()
print("my weights:\n", weights)


#### Add more neurons and also take a look during the training¶

In [ ]:
model = models.Sequential(name='My_two_neurons_model_training')
# hidden layer with 2 neurons (or nodes)
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01)
model.compile(loss='MSE',optimizer=sgd)

def plotting(model_t, ax, title):
y_model_t = model_t.predict(x_train2)
# plot the prediction and the ground truth
ax.plot(x_train2, y_train2,'.r',label='data')
ax.plot(x_train2, y_model_t,'b', label='FFNN' )
ax.legend()
loss = mseLoss(y_train2, y_model_t)
ax.set_title(title + ' - MSE Loss = ' + str(np.round(loss,5)))

f, ax = plt.subplots(2,3, figsize=(16,7.5))
ax = ax.ravel()
plotting(model, ax[0], 'Epoch 0')
for i in range(5):
model.fit(x_train2, y_train2, epochs=100, batch_size=32, verbose=0)
plotting(model, ax[i+1], 'Epoch '+str(100*(i+1)))

In [ ]:
model.summary()


### Let's fit something very nonlinear¶

In [ ]:
toySet_3 = pd.read_csv('../data/toyDataSet_3.csv')

x_train3 = toySet_3['x'].values.reshape(-1,1)
y_train3 = toySet_3['y'].values.reshape(-1,1)

In [ ]:
plt.plot(x_train3, y_train3,'or')


Desing now a FFNN with two-hidden layers with tanh activation function

In [ ]:
model = models.Sequential(name='MyNet')

# hidden layer with 20 neurons (or nodes)
# Add another hidden layer of 20 neurons
# hidden layer with 20 neurons (or nodes)
# output layer, one neuron

# model.summary()

In [ ]:
sgd = optimizers.SGD(lr=0.02)
model.compile(loss='MSE',optimizer=sgd)
history = model.fit(x_train3, y_train3, epochs=1000, batch_size=16, verbose=0)

In [ ]:
# Log-scale is helpful since the loss decays fast
plt.loglog(history.history['loss'],'b')
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

In [ ]:
y_model = model.predict(x_train3)

# plot the prediction and the ground truth
plt.plot(x_train3, y_train3,'.r',label='data')
plt.plot(x_train3, y_model,'b', label='FFNN' )
plt.legend()

Loss_t = mseLoss(y_train3, y_model)
print('MSE Loss = ', Loss_t)


## Info for the Tensor Flow 2 and Keras¶

#### Instructions for running tf.keras with Tensorflow 2.0:¶

1. Create a conda virtual environment by cloning an existing one that you know works

conda create --name myclone --clone myenv

All references to Keras should be written as tf.keras. For example:

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])

loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

tf.keras.models.Sequential
tf.keras.layers.Dense, tf.keras.layers.Activation,
tf.keras.layers.Dropout, tf.keras.layers.Flatten, tf.keras.layers.Reshape
tf.keras.optimizers.SGD
tf.keras.preprocessing.image.ImageDataGenerator
tf.keras.regularizers
tf.keras.datasets.mnist

You could avoid the long names by using

from tensorflow import keras
from tensorflow.keras import layers

These imports do not work on some systems, however, because they pick up previous versions of keras and tensorflow. That is why I avoid them in this lab.

In [ ]: