# CS109A Introduction to Data Science

## Standard Section 10: Feed Forward Neural Networks, Regularization, SGD Solver¶

Harvard University
Fall 2019
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner
Section Leaders: Marios Mattheakis, Abhimanyu (Abhi) Vasishth, Robbert (Rob) Struyven

In [122]:
#RUN THIS CELL
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)

Out[122]:

The goal of this section is solve a difficult regression task and to be familiar with the regularization and with the solver that is used in neural networks (NNs).

Specifically, we will:

1. Use NNs to solve a regression task where polynomial regression fails
2. Fit noise data and observe underfitting and overfitting
3. Learn about early-stopping and regularization
4. Explore the SGD solver

#### Import packages¶

In [123]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# from pandas import DataFrame

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures

import copy
import operator


## 1. Regression: Neuran Nets VS Polynomial Regression¶

We will try to fit a difficult function where polynomial regression fails.

The dielectric function of many optical materials depends on the frequency and is given by the Lorentz model as: $$\varepsilon(\omega) = 1 - \frac{\omega_0^2}{\omega_0^2-\omega^2 +i\omega\Gamma},$$ where $\omega$ is the frequency, $\omega_0$ is the resonance frequency of the bound electrons, and $\Gamma$ is the electron damping.

In many situations we measure the real part of the dielectric function in the Lab and then we want to fit the observations. Let's assume that we perform an experiment and the observations came from a Lorentz material.

#### Lorentz model¶

In [7]:
def Lorentz(w,w0=1):
Gamma = 7e-2
eps = 1 - w0**2/(w0**2-w**2+1j*Gamma)
return  eps.real

In [8]:
plt.figure(figsize=[8,4] )
w = np.linspace(0,2,128)
e = Lorentz(w)
wTest = np.linspace(0.01,1.95, 64)
eTest = Lorentz(wTest)

plt.plot(w,e,'ob',label='Train')
plt.plot(wTest,eTest,'or',label='Test')
plt.xlabel('$\omega$')
plt.ylabel('$\epsilon$')
plt.ylim([-10,10]);


### Using polynomial regression to fit the data¶

In [9]:
x = copy.copy(w)
y = copy.copy(e)

# transforming the data to include another axis
x = x[:, np.newaxis]
y = y[:, np.newaxis]

polynomial_features= PolynomialFeatures(degree=15)
x_poly = polynomial_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

rmse = np.sqrt(mean_squared_error(y,y_poly_pred))
r2 = r2_score(y,y_poly_pred)
# print(rmse)
# print(r2)

plt.plot(x,y,'ob')
# sort the values of x before line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x,y_poly_pred), key=sort_axis)
x, y_poly_pred = zip(*sorted_zip)
plt.plot(x, y_poly_pred, color='m',linewidth=2)
plt.show()


### Using Neural Networks¶

#### Design the Network¶

In [46]:
model_1 = models.Sequential(name='LorentzModel')

# hidden layer with 20 neurons (or nodes)
#second hidden layer with 20 neurons (or nodes)

# output layer, one neuron

# model_t.summary()


#### Select a solver and train the NN¶

In [47]:
sgd = optimizers.SGD(lr=0.01, momentum=0.9)
model_1.compile(loss='MSE',optimizer=sgd)
history_1 = model_1.fit(w, e, validation_data=(wTest,eTest), epochs=800, batch_size= 32, verbose=0)
# history_1 = model_1.fit(w, e, epochs=800, batch_size=32, verbose=0, validation_split=.1)


#### Plot the training and validation loss¶

In [48]:
plt.plot(history_1.history['loss'],'b',label='train')
plt.plot(history_1.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

Out[48]:
Text(0.5, 0, 'Epoch')

#### Visualize the model prediction¶

In [49]:
e_hat = model_1.predict(w)

# plot the prediction and the ground truth
plt.plot(w, e,'.r',label='data')
plt.plot(w, e_hat,'b', label='FFNN' )
plt.legend()

Out[49]:

## 2. Noisy data, underfitting and overfitting¶

### In real experiments we have always noise¶

Hence, in reality we measure observations taken by $$\varepsilon(\omega) = 1 - \frac{\omega_0^2}{\omega_0^2-\omega^2 +i\omega\Gamma} + \epsilon,$$ where, $\epsilon$ is white noise.

Our goal is to discover the underlying law, namely the Lorentz model, by using neural networks.

In [51]:
plt.figure(figsize=[8,4] )
Ntrain = 128
w2 = np.linspace(0,2, Ntrain)
sigNoise = 1
e2_clean = Lorentz(w2)
e2 = e2_clean +  np.random.normal(loc=0, scale= sigNoise, size=w2.shape)

wTest2 = np.linspace(0.01,1.95, 64)
eTest2 = Lorentz(wTest) +  np.random.normal(loc=0, scale= sigNoise, size=wTest2.shape)

plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.legend()
plt.xlabel('$\omega$')
plt.ylabel('$\epsilon$')
plt.ylim([-13,13])

Out[51]:
(-13, 13)

### Discover the underlying function¶

In [63]:
n_neurons = 50

model_2 = models.Sequential(name='noiseLorentzModel')
# first hidden layer
# second hidden layer
# output layer, one neuron
# model_t.summary()

In [64]:
sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_2.compile(loss='MSE',optimizer=sgd)
history_2 = model_2.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=800, batch_size=32, verbose=0)

In [65]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2.history['loss'],'b',label='train')
plt.loglog(history_2.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2 = model_2.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[65]:

### Underfitting¶

We use the same architecture but we train less

In [75]:
model_2_uf = models.Sequential(name='noiseLorentzModel_underFitting')

# first hidden layer
# second hidden layer
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_2_uf.compile(loss='MSE',optimizer=sgd)
history_2_uf = model_2_uf.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=300, batch_size=32, verbose=0)

In [93]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_uf.history['loss'],'b',label='train')
plt.loglog(history_2_uf.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_uf = model_2_uf.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_uf,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[93]:

### Overfitting¶

Add more neurons and train very long time.

In [74]:
model_2_of = models.Sequential(name='noiseLorentzModel_overFitting')

# first hidden layer
# second hidden layer
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_2_of.compile(loss='MSE',optimizer=sgd)
history_2_of = model_2_of.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)

Exception ignored in:
Traceback (most recent call last):
File "/home/marios/anaconda3/envs/cs109a/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 537, in __del__
handle=self._handle, deleter=self._deleter)
File "/home/marios/anaconda3/envs/cs109a/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 1141, in delete_iterator
deleter)
KeyboardInterrupt:

In [92]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of.history['loss'],'b',label='train')
plt.loglog(history_2_of.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of = model_2_of.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[92]:

## Regularization¶

The easiest way to avoid overfitting is the early-stopping method, namely stop the training when the validation loss is minimum. Early-stopping does not change the model. On the other hand regularization changes the model since we change the loss function.

Two common regularization methods are the so-called $L_1$ and $L_2$.

• $L_1$ is trying minimize number of the network parameters and reduces the model complexity. In other words, it is trying to have as many zero parameters as it is possible. $L_1$ wants the smallest number of parameters.
• $L_2$ is trying to minimize the value of all the parameters and have a more stable network. So, it does not care about the number of the non-zero parameters but it cares about their values. $L_2$ wants the parameters with small values.

Warning! In the extreme limit of too large regularization coefficients both $L_1$ and $L_2$ lead to zero parameters. Hence, overusing regularization yields underfitting.

Weight decay is another is a another common way to regularize a network. After each update (epoch), the weights are multiplied by a factor slightly less than 1. This prevents the weights from growing too large, and can be seen as gradient descent on a quadratic regularization term.

### $L_1$ Regularizer¶

In [78]:
kernel_weight = 0.003
bias_weight = 0.003

model_2_of_l1 = models.Sequential(name='noiseLorentzModel_l1')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l1(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01, momentum=0.9)
model_2_of_l1.compile(loss='MSE',optimizer=sgd)
history_2_of_l1 = model_2_of_l1.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)

In [94]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of_l1.history['loss'],'b',label='train')
plt.loglog(history_2_of_l1.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of_l1 = model_2_of_l1.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l1,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[94]:

### $L_2$ Regularizer¶

In [80]:
kernel_weight = 0.003
bias_weight = 0.003

model_2_of_l2 = models.Sequential(name='noiseLorentzModel_l2')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_2_of_l2.compile(loss='MSE',optimizer=sgd)
history_2_of_l2 = model_2_of_l2.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)

In [95]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of_l2.history['loss'],'b',label='train')
plt.loglog(history_2_of_l2.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of_l2 = model_2_of_l2.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l2,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[95]:

### Underfitting through Regularization¶

In [83]:
kernel_weight = 1
bias_weight = 1

model_2_of_l2_uf = models.Sequential(name='noiseLorentzModel_l2_uf')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.9, )
model_2_of_l2_uf.compile(loss='MSE',optimizer=sgd)
history_2_of_l2_uf = model_2_of_l2_uf.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)

In [96]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of_l2_uf.history['loss'],'b',label='train')
plt.loglog(history_2_of_l2_uf.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of_l2_uf = model_2_of_l2_uf.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l2_uf,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[96]:
In [86]:
w_l1 = model_2_of_l2_uf.get_weights()
print(w_l1)

[array([[-2.88417112e-07,  1.27527655e-05, -1.16637480e-07,
1.27158529e-07, -4.84074462e-07,  1.43536795e-06,
-6.86162548e-06, -1.04252172e-06, -2.55289251e-06,
-3.07931236e-06,  3.92774746e-06,  4.16653438e-06,
1.18414084e-06, -2.58007901e-07, -2.13642478e+00,
2.44120292e-06, -8.36712161e-07, -9.88112902e-07,
-9.44330736e-07, -1.05971071e-06,  1.96455017e-06,
-1.45048972e-07, -3.33212574e-06, -4.09947489e-07,
-2.95707082e-06,  1.24698761e-06, -3.00997317e-06,
-1.00888576e-06,  1.24241026e-07, -7.72184558e-06,
-6.22474681e-06, -2.13643265e+00,  1.23318694e-06,
2.50667017e-06,  1.11350982e-06, -1.76371100e-06,
-2.08186165e-07,  8.18114495e-06, -8.95465348e-07,
-2.61973163e-08,  5.01561495e-08, -3.92719448e-06,
2.29828856e-07, -3.77010508e-07, -5.46358422e-07,
2.74788044e-06,  2.31198260e-06,  1.33440528e-06,
4.40988288e-07,  3.49153220e-06, -1.82652673e-06,
-1.16432284e-06,  9.49320747e-06,  2.44958500e-07,
6.31779358e-07, -1.73797844e-07, -3.16343380e-06,
1.41480348e-06,  1.34047468e-07,  5.35134041e-06,
1.45339445e-05, -2.37776976e-06, -1.65201016e-06,
-1.77797267e-06, -1.07609947e-06, -2.80373615e-06,
2.78306516e-06, -2.20638890e-06,  2.28984800e-06,
8.36447100e-07,  5.35679101e-06, -3.97811118e-06,
1.08231472e-06, -3.13787109e-06,  1.65237066e-06,
-1.21595713e-06, -1.33467574e-06,  1.50074516e-06,
-2.79652591e-06,  2.95822474e-07, -6.01960392e-06,
1.74636375e-06,  1.83343036e-06,  6.28173609e-07,
-1.95113967e-06, -1.28427788e-03, -3.50337518e-06,
-4.88337264e-07,  5.53414850e-07,  1.44616006e-05,
7.85187922e-07,  1.02689373e-05, -2.28825661e-06,
-2.66807039e-08, -3.05666367e-06, -4.81291090e-06,
-1.86799230e-06,  2.28896988e-06, -1.89967795e-06,
-9.42158522e-07,  5.87020713e-06,  2.95284985e-06,
-6.90401566e-06, -1.34891900e-06,  1.49266998e-06,
-2.48025003e-06, -6.95027724e-09, -3.59443021e-07,
-5.06257685e-08,  1.53962992e-06,  3.11969529e-06,
1.86097225e-06,  1.11225427e-06, -9.04990543e-07,
-4.07939842e-06, -6.20864625e-07, -1.69429518e-06,
2.36994651e-06, -3.33540584e-07, -1.73009039e-06,
3.42653379e-06, -3.57558179e-07, -6.63855474e-07,
-1.17998559e-06,  2.31222032e-07,  1.74490893e-07,
-1.98208249e-06, -8.47229160e-07,  3.86745069e-06,
6.91940045e-07,  5.81900736e-07,  2.93065386e-06,
3.74254967e-07,  3.76247864e+01,  2.02175806e-06,
6.16663058e-08,  3.79456242e-06,  1.55372391e-06,
-2.37112431e-07,  3.39812402e-07,  2.01163607e-06,
-1.54178542e-06, -6.63706942e-07, -8.31436864e-06,
-1.38172913e-06, -6.23475330e-07,  2.63843117e-06,
-1.49550465e-06,  9.01490580e-07, -1.31065747e-06]], dtype=float32), array([ 3.7592869e-07, -1.6622209e-05,  1.5202757e-07, -1.6574100e-07,
6.3095223e-07, -1.8708870e-06,  8.9435798e-06,  1.3588437e-06,
3.3274912e-06,  4.0136370e-06, -5.1195061e-06, -5.4307443e-06,
-1.5434324e-06,  3.3629269e-07,  2.2912655e+00, -3.1819125e-06,
1.0905874e-06,  1.2879261e-06,  1.2308592e-06,  1.3812482e-06,
-2.5606346e-06,  1.8905975e-07,  4.3431592e-06,  5.3433371e-07,
3.8543044e-06, -1.6253483e-06,  3.9232591e-06,  1.3150022e-06,
-1.6193825e-07,  1.0064806e-05,  8.1134604e-06,  2.2912762e+00,
-1.6073602e-06, -3.2672438e-06, -1.4513711e-06,  2.2988550e-06,
2.7135400e-07, -1.0663469e-05,  1.1671672e-06,  3.4146101e-08,
-6.5374536e-08,  5.1187817e-06, -2.9956351e-07,  4.9140294e-07,
7.1213458e-07, -3.5816427e-06, -3.0134845e-06, -1.7392907e-06,
-5.7479281e-07, -4.5509350e-06,  2.3807311e-06,  1.5176017e-06,
-1.2373635e-05, -3.1928386e-07, -8.2347390e-07,  2.2653154e-07,
4.1232829e-06, -1.8440833e-06, -1.7472017e-07, -6.9750436e-06,
-1.8943836e-05,  3.0992312e-06,  2.1532626e-06,  2.3174455e-06,
1.4026092e-06,  3.6544457e-06, -3.6275023e-06,  2.8758509e-06,
-2.9846337e-06, -1.0902418e-06, -6.9821485e-06,  5.1851503e-06,
-1.4107104e-06,  4.0899627e-06, -2.1537328e-06,  1.5849027e-06,
1.7396429e-06, -1.9561012e-06,  3.6450476e-06, -3.8558096e-07,
7.8460716e-06, -2.2762449e-06, -2.3897296e-06, -8.1877420e-07,
2.5431543e-06,  1.6739520e-03,  4.5663692e-06,  6.3650833e-07,
-7.2133196e-07, -1.8849534e-05, -1.0234299e-06, -1.3384738e-05,
2.9825590e-06,  3.4776157e-08,  3.9841166e-06,  6.2732429e-06,
2.4347773e-06, -2.9834887e-06,  2.4760775e-06,  1.2280282e-06,
-7.6513461e-06, -3.8488020e-06,  8.9988316e-06,  1.7582079e-06,
-1.9455763e-06,  3.2328071e-06,  9.0591303e-09,  4.6850516e-07,
6.5986647e-08, -2.0067844e-06, -4.0662717e-06, -2.4256281e-06,
-1.4497344e-06,  1.1795827e-06,  5.3171707e-06,  8.0924724e-07,
2.2083780e-06, -3.0890351e-06,  4.3474347e-07,  2.2550339e-06,
-4.4662129e-06,  4.6604848e-07,  8.6528229e-07,  1.5380162e-06,
-3.0137940e-07, -2.2743490e-07,  2.5834863e-06,  1.1042954e-06,
-5.0409117e-06, -9.0188843e-07, -7.5846066e-07, -3.8198737e-06,
-4.8781129e-07, -3.7685844e+01, -2.6351997e-06, -8.0377028e-08,
-4.9459086e-06, -2.0251546e-06,  3.0905704e-07, -4.4291832e-07,
-2.6220064e-06,  2.0095933e-06,  8.6508879e-07,  1.0837116e-05,
1.8009732e-06,  8.1265017e-07, -3.4389827e-06,  1.9492704e-06,
-1.1750211e-06,  1.7083370e-06], dtype=float32), array([[-4.10886258e-09,  5.44422241e-09,  1.12696563e-08, ...,
1.01308084e-09, -2.68624301e-09,  5.22214716e-10],
[ 1.81679098e-07, -2.40723665e-07, -4.98303621e-07, ...,
-4.47947635e-08,  1.18775944e-07, -2.30904611e-08],
[-1.66164627e-09,  2.20167062e-09,  4.55750637e-09, ...,
4.09694917e-10, -1.08633103e-09,  2.11186527e-10],
...,
[-2.13053113e-08,  2.82294366e-08,  5.84355710e-08, ...,
5.25303667e-09, -1.39287302e-08,  2.70779421e-09],
[ 1.28428628e-08, -1.70167311e-08, -3.52249856e-08, ...,
-3.16653326e-09,  8.39625081e-09, -1.63226066e-09],
[-1.86719529e-08,  2.47402276e-08,  5.12128402e-08, ...,
4.60375515e-09, -1.22071242e-08,  2.37310638e-09]], dtype=float32), array([ 6.9263903e-04, -9.1537647e-04, -1.8582679e-03, -8.6006563e-05,
-1.8961675e-04, -6.3404790e-04,  7.1726227e-04,  1.4960077e-03,
-1.9075507e-03, -1.0198024e-03, -5.8383564e-04, -7.7785971e-04,
9.9176634e-04,  4.9567968e-04, -1.3956660e-03, -9.3661644e-04,
6.8847183e-04, -8.4538641e-04, -8.4630301e-05,  3.3667600e-03,
1.1513699e-03,  7.3868083e-04, -4.8228004e-04,  3.8174330e-04,
6.6337176e-04,  8.4649015e-05, -8.9763314e-05,  4.1558500e-04,
2.8803863e-04,  1.4788480e-03,  1.4005979e-03,  1.9555381e-03,
-8.4131747e-04,  7.6824636e-04, -8.0400496e-04, -1.7430540e-03,
1.0631359e-03,  1.3165418e-03, -7.8513753e-04, -2.4282478e-04,
-1.8962333e-04, -8.7219966e-04, -4.6748249e-04,  2.4226133e-04,
1.7403658e-03, -2.0569918e-04, -2.9222993e-04, -1.5053046e-03,
2.1739944e-04,  2.3497007e-04, -2.9648544e-04, -6.9273449e-04,
7.4854237e-04,  7.7706459e-04,  4.6976656e-03, -3.3625832e-04,
-1.1255499e-03,  8.9332275e-04,  4.9856654e-04,  5.5233564e-04,
-3.2161933e-04, -5.1686913e-04, -6.3532940e-04, -2.9614073e-04,
-9.3045761e-05,  6.2411098e-05,  3.5030756e-04,  5.4064149e-04,
-1.4655702e-04, -1.7123058e-04, -3.6407053e-04,  8.8821677e-04,
2.1117995e-04,  2.3532979e-04,  8.7957387e-04, -4.6700588e-04,
-1.7037743e-04, -5.2308675e-04, -3.7289131e-04,  1.6234192e-04,
3.4899553e-03, -3.5835779e-04, -2.1936349e-04,  1.3014348e-04,
1.2250510e-03, -1.6847672e-04, -5.9803110e-04, -4.9893497e-05,
-1.0368922e-03, -4.0937553e-04, -9.9769700e-04,  5.7169644e-04,
-7.9602702e-04, -9.8119001e-04, -3.8751878e-04,  2.2816714e-03,
3.1650392e-04, -8.9043914e-04,  3.7664629e-04, -6.5263448e-05,
-6.8433350e-04, -1.9627070e-04, -3.5797816e-04,  2.1901599e-04,
1.7351424e-02, -1.0822411e-03,  2.6112271e-04,  6.9150538e-04,
-6.4443261e-04,  6.1023631e-04,  1.8032314e-03,  4.3882115e-04,
-1.0913245e-03,  6.5822940e-05,  9.7107654e-04, -2.3037521e-03,
9.0284692e-04,  7.2213076e-04, -8.5996324e-04,  2.6774083e-05,
7.0166495e-04,  2.2808732e-03, -2.1146849e-04, -1.4010730e-04,
3.8797373e-04,  5.9421221e-04,  1.8837927e-03,  3.9837512e-05,
-6.1478605e-04,  9.8467595e-04,  2.6815565e-04,  4.2062858e-04,
-8.0611953e-04,  3.6163151e-04,  6.7630230e-05, -5.8004342e-04,
1.2767658e-04,  9.3256589e-04, -1.4607129e-03, -1.1284142e-03,
-2.3228749e-05, -6.9869775e-04, -4.3664465e-04,  9.5241633e-04,
5.8255694e-04, -7.3297400e-05, -1.3849903e-03, -1.7132581e-04,
4.5371219e-04, -8.8329078e-05], dtype=float32), array([[-0.22093561],
[ 0.29252952],
[ 0.60225844],
[ 0.02736716],
[ 0.06034451],
[ 0.2021632 ],
[-0.2288298 ],
[-0.48164615],
[ 0.6188689 ],
[ 0.3262449 ],
[ 0.18609525],
[ 0.24827772],
[-0.31718072],
[-0.15791807],
[ 0.44864646],
[ 0.29937983],
[-0.2196003 ],
[ 0.2699878 ],
[ 0.02692894],
[-1.1446787 ],
[-0.36888862],
[-0.23570329],
[ 0.15363969],
[-0.12155622],
[-0.21155523],
[-0.02693519],
[ 0.02856238],
[-0.13235079],
[-0.09168809],
[-0.47599408],
[-0.45026237],
[-0.6350904 ],
[ 0.26867983],
[-0.24519327],
[ 0.2566829 ],
[ 0.5636329 ],
[-0.34026867],
[-0.4227223 ],
[ 0.2506168 ],
[ 0.07728675],
[ 0.06034685],
[ 0.2786174 ],
[ 0.14891388],
[-0.07710703],
[-0.5627314 ],
[ 0.0654643 ],
[ 0.09302402],
[ 0.48471424],
[-0.06918968],
[-0.07478518],
[ 0.09437988],
[ 0.2209659 ],
[-0.23886694],
[-0.24802363],
[-1.7416492 ],
[ 0.10705478],
[ 0.36050543],
[-0.28542224],
[-0.15884045],
[-0.17602113],
[ 0.10238892],
[ 0.16468497],
[ 0.20257276],
[ 0.09427051],
[ 0.02960727],
[-0.01985889],
[-0.11153305],
[-0.1722837 ],
[ 0.04663718],
[ 0.05449111],
[ 0.11591919],
[-0.28377602],
[-0.06720978],
[-0.07489955],
[-0.28099436],
[ 0.14876083],
[ 0.0542204 ],
[ 0.16667284],
[ 0.11873418],
[-0.05166222],
[-1.1933751 ],
[ 0.11409934],
[ 0.06981548],
[-0.04141351],
[-0.39285943],
[ 0.05361446],
[ 0.1906363 ],
[ 0.0158755 ],
[ 0.33177376],
[ 0.13036917],
[ 0.31909755],
[-0.18221223],
[ 0.2541176 ],
[ 0.31376725],
[ 0.12339851],
[-0.7468232 ],
[-0.10075898],
[ 0.28449127],
[-0.11993048],
[ 0.0207664 ],
[ 0.21827388],
[ 0.06246376],
[ 0.11397963],
[-0.06970492],
[-5.026797  ],
[ 0.3464622 ],
[-0.08311414],
[-0.22057138],
[ 0.20549016],
[-0.19454165],
[-0.5837827 ],
[-0.13976452],
[ 0.3494015 ],
[-0.02094436],
[-0.3104995 ],
[ 0.7544901 ],
[-0.2884907 ],
[-0.230394  ],
[ 0.2746778 ],
[-0.00851912],
[-0.22382897],
[-0.74654615],
[ 0.06730189],
[ 0.04458465],
[-0.12354336],
[-0.18941501],
[-0.61085904],
[-0.01267572],
[ 0.19599682],
[-0.31489128],
[-0.0853548 ],
[-0.13396068],
[ 0.25735834],
[-0.11514252],
[-0.02151958],
[ 0.18488023],
[-0.04062889],
[-0.29807293],
[ 0.4700226 ],
[ 0.36143357],
[ 0.00739107],
[ 0.22287814],
[ 0.13906974],
[-0.30447423],
[-0.18568598],
[ 0.02332265],
[ 0.44514015],
[ 0.05452228],
[-0.14451884],
[ 0.02810591]], dtype=float32), array([0.86910945], dtype=float32)]


### Weight decay & clipping¶

In [88]:
kernel_weight = 0.003
bias_weight = 0.003

model_2_wd = models.Sequential(name='noiseLorentzModel_sgd_weightdecay')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.9, clipvalue=0.5, decay=1e-3)
model_2_wd.compile(loss='MSE',optimizer=sgd)
history_2_wd = model_2_wd.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)

In [98]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_2_of_l2.history['loss'],'b',label='train')
plt.loglog(history_2_of_l2.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
e_hat_2_of_l2 = model_2_of_l2.predict(w2)

plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l2,'g', label='FFNN' , linewidth=2,)

plt.legend()

plt.subplot(2,2,3)
plt.loglog(history_2_wd.history['loss'],'b',label='train')
plt.loglog(history_2_wd.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (Weight decay)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_2_wd = model_2_wd.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_wd,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()


## Explore the SGD solver¶

Exploration

• Learning rate
• Momentum
• Number of minibatches
In [104]:
kernel_weight = 0.003
bias_weight = 0.003

model_3 = models.Sequential(name='noiseLorentzModel_sgd')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_3.compile(loss='MSE',optimizer=sgd)
history_3 = model_3.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)

In [105]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_3 = model_3.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2,)

plt.legend()

Out[105]:

### Too large learning rate¶

In [106]:
kernel_weight = 0.003
bias_weight = 0.003

model_3_lrL = models.Sequential(name='noiseLorentzModel_sgd_learningRate_Large')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.07,momentum=0.9)
model_3_lrL.compile(loss='MSE',optimizer=sgd)
history_3_lrL = model_3_lrL.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)

In [115]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.01)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2,)

plt.legend()

plt.subplot(2,2,3)
plt.loglog(history_3_lrL.history['loss'],'b',label='train')
plt.loglog(history_3_lrL.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.07)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_3_lrL = model_3_lrL.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3_lrL,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()


### Too small learning rate¶

In [108]:
kernel_weight = 0.003
bias_weight = 0.003

model_3_lrS = models.Sequential(name='noiseLorentzModel_sgd_learningRate_Small')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.002,momentum=0.9)
model_3_lrS.compile(loss='MSE',optimizer=sgd)
history_3_lrS = model_3_lrS.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)

In [116]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.01)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2,)

plt.legend()

plt.subplot(2,2,3)
plt.loglog(history_3_lrS.history['loss'],'b',label='train')
plt.loglog(history_3_lrS.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.005)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_3_lrS = model_3_lrS.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3_lrS,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()


### Momentum¶

In [110]:
kernel_weight = 0.003
bias_weight = 0.003

model_3_mn = models.Sequential(name='noiseLorentzModel_sgd_momentum')

# first hidden layer
# second hidden layer
bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron

sgd = optimizers.SGD(lr=0.01,momentum=0.5)
model_3_mn.compile(loss='MSE',optimizer=sgd)
history_3_mn = model_3_mn.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)

In [117]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (Momentum = 0.9)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()

##

plt.subplot(2,2,3)
plt.loglog(history_3_mn.history['loss'],'b',label='train')
plt.loglog(history_3_mn.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (Momentum = 0.5)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_3_mn = model_3_mn.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3_mn,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()