CS109A Introduction to Data Science

Standard Section 10: Feed Forward Neural Networks, Regularization, SGD Solver

Harvard University
Fall 2019
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner
Section Leaders: Marios Mattheakis, Abhimanyu (Abhi) Vasishth, Robbert (Rob) Struyven

In [122]:
#RUN THIS CELL 
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)
Out[122]:

The goal of this section is solve a difficult regression task and to be familiar with the regularization and with the solver that is used in neural networks (NNs).

Specifically, we will:

  1. Use NNs to solve a regression task where polynomial regression fails
  2. Fit noise data and observe underfitting and overfitting
  3. Learn about early-stopping and regularization
  4. Explore the SGD solver

Import packages

In [123]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# from pandas import DataFrame

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures

import copy
import operator

1. Regression: Neuran Nets VS Polynomial Regression

We will try to fit a difficult function where polynomial regression fails.

The dielectric function of many optical materials depends on the frequency and is given by the Lorentz model as: $$ \varepsilon(\omega) = 1 - \frac{\omega_0^2}{\omega_0^2-\omega^2 +i\omega\Gamma},$$ where $\omega$ is the frequency, $\omega_0$ is the resonance frequency of the bound electrons, and $\Gamma$ is the electron damping.

In many situations we measure the real part of the dielectric function in the Lab and then we want to fit the observations. Let's assume that we perform an experiment and the observations came from a Lorentz material.

Lorentz model

In [7]:
def Lorentz(w,w0=1):
    Gamma = 7e-2
    eps = 1 - w0**2/(w0**2-w**2+1j*Gamma)
    return  eps.real
In [8]:
plt.figure(figsize=[8,4] )
w = np.linspace(0,2,128)
e = Lorentz(w)
wTest = np.linspace(0.01,1.95, 64)
eTest = Lorentz(wTest) 

plt.plot(w,e,'ob',label='Train')
plt.plot(wTest,eTest,'or',label='Test')
plt.xlabel('$\omega$')
plt.ylabel('$\epsilon$')
plt.ylim([-10,10]);

Using polynomial regression to fit the data

In [9]:
x = copy.copy(w)
y = copy.copy(e)

# transforming the data to include another axis
x = x[:, np.newaxis]
y = y[:, np.newaxis]

polynomial_features= PolynomialFeatures(degree=15)
x_poly = polynomial_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

rmse = np.sqrt(mean_squared_error(y,y_poly_pred))
r2 = r2_score(y,y_poly_pred)
# print(rmse)
# print(r2)


plt.plot(x,y,'ob')
# sort the values of x before line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x,y_poly_pred), key=sort_axis)
x, y_poly_pred = zip(*sorted_zip)
plt.plot(x, y_poly_pred, color='m',linewidth=2)
plt.show()

Using Neural Networks

Design the Network

In [46]:
model_1 = models.Sequential(name='LorentzModel')

# hidden layer with 20 neurons (or nodes)
model_1.add(layers.Dense(50, kernel_initializer='uniform', activation='tanh', input_shape=(1,)))
#second hidden layer with 20 neurons (or nodes)
model_1.add(layers.Dense(50, kernel_initializer='uniform', activation='tanh'))

# output layer, one neuron 
model_1.add(layers.Dense(1,  activation='linear'))

# model_t.summary()

Select a solver and train the NN

In [47]:
sgd = optimizers.SGD(lr=0.01, momentum=0.9)
model_1.compile(loss='MSE',optimizer=sgd) 
history_1 = model_1.fit(w, e, validation_data=(wTest,eTest), epochs=800, batch_size= 32, verbose=0) 
# history_1 = model_1.fit(w, e, epochs=800, batch_size=32, verbose=0, validation_split=.1)

Plot the training and validation loss

In [48]:
plt.plot(history_1.history['loss'],'b',label='train')
plt.plot(history_1.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
Out[48]:
Text(0.5, 0, 'Epoch')

Visualize the model prediction

In [49]:
e_hat = model_1.predict(w)

# plot the prediction and the ground truth
plt.plot(w, e,'.r',label='data')
plt.plot(w, e_hat,'b', label='FFNN' )
plt.legend()
Out[49]:

2. Noisy data, underfitting and overfitting

In real experiments we have always noise

Hence, in reality we measure observations taken by $$ \varepsilon(\omega) = 1 - \frac{\omega_0^2}{\omega_0^2-\omega^2 +i\omega\Gamma} + \epsilon,$$ where, $\epsilon$ is white noise.

Our goal is to discover the underlying law, namely the Lorentz model, by using neural networks.

In [51]:
plt.figure(figsize=[8,4] )
Ntrain = 128
w2 = np.linspace(0,2, Ntrain)
sigNoise = 1
e2_clean = Lorentz(w2) 
e2 = e2_clean +  np.random.normal(loc=0, scale= sigNoise, size=w2.shape)



wTest2 = np.linspace(0.01,1.95, 64)
eTest2 = Lorentz(wTest) +  np.random.normal(loc=0, scale= sigNoise, size=wTest2.shape)

plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.legend()
plt.xlabel('$\omega$')
plt.ylabel('$\epsilon$')
plt.ylim([-13,13])
Out[51]:
(-13, 13)

Discover the underlying function

In [63]:
n_neurons = 50

model_2 = models.Sequential(name='noiseLorentzModel')
# first hidden layer 
model_2.add(layers.Dense(n_neurons, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2.add(layers.Dense(n_neurons, activation='tanh'))
# output layer, one neuron 
model_2.add(layers.Dense(1,  activation='linear'))
# model_t.summary()
In [64]:
sgd = optimizers.SGD(lr=0.01,momentum=0.9) 
model_2.compile(loss='MSE',optimizer=sgd) 
history_2 = model_2.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=800, batch_size=32, verbose=0)
In [65]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2.history['loss'],'b',label='train')
plt.loglog(history_2.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2 = model_2.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[65]:

Underfitting

We use the same architecture but we train less

In [75]:
model_2_uf = models.Sequential(name='noiseLorentzModel_underFitting')

# first hidden layer 
model_2_uf.add(layers.Dense(n_neurons, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2_uf.add(layers.Dense(n_neurons, activation='tanh'))
# output layer, one neuron 
model_2_uf.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_2_uf.compile(loss='MSE',optimizer=sgd) 
history_2_uf = model_2_uf.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=300, batch_size=32, verbose=0)
In [93]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_uf.history['loss'],'b',label='train')
plt.loglog(history_2_uf.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_uf = model_2_uf.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_uf,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[93]:

Overfitting

Add more neurons and train very long time.

In [74]:
model_2_of = models.Sequential(name='noiseLorentzModel_overFitting')

# first hidden layer 
model_2_of.add(layers.Dense(n_neurons+100, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2_of.add(layers.Dense(n_neurons+100, activation='tanh'))
# output layer, one neuron 
model_2_of.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.9) 
model_2_of.compile(loss='MSE',optimizer=sgd) 
history_2_of = model_2_of.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)
Exception ignored in: 
Traceback (most recent call last):
  File "/home/marios/anaconda3/envs/cs109a/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 537, in __del__
    handle=self._handle, deleter=self._deleter)
  File "/home/marios/anaconda3/envs/cs109a/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 1141, in delete_iterator
    deleter)
KeyboardInterrupt: 
In [92]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of.history['loss'],'b',label='train')
plt.loglog(history_2_of.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of = model_2_of.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[92]:

Regularization

The easiest way to avoid overfitting is the early-stopping method, namely stop the training when the validation loss is minimum. Early-stopping does not change the model. On the other hand regularization changes the model since we change the loss function.

Two common regularization methods are the so-called $L_1$ and $L_2$.

  • $L_1$ is trying minimize number of the network parameters and reduces the model complexity. In other words, it is trying to have as many zero parameters as it is possible. $L_1$ wants the smallest number of parameters.
  • $L_2$ is trying to minimize the value of all the parameters and have a more stable network. So, it does not care about the number of the non-zero parameters but it cares about their values. $L_2$ wants the parameters with small values.

Warning! In the extreme limit of too large regularization coefficients both $L_1$ and $L_2$ lead to zero parameters. Hence, overusing regularization yields underfitting.

Weight decay is another is a another common way to regularize a network. After each update (epoch), the weights are multiplied by a factor slightly less than 1. This prevents the weights from growing too large, and can be seen as gradient descent on a quadratic regularization term.

$L_1$ Regularizer

In [78]:
kernel_weight = 0.003
bias_weight = 0.003

model_2_of_l1 = models.Sequential(name='noiseLorentzModel_l1')

# first hidden layer 
model_2_of_l1.add(layers.Dense(n_neurons+100, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2_of_l1.add(layers.Dense(n_neurons+100, activation='tanh', kernel_regularizer=tf.keras.regularizers.l1(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l1(bias_weight) ))
# output layer, one neuron 
model_2_of_l1.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01, momentum=0.9) 
model_2_of_l1.compile(loss='MSE',optimizer=sgd) 
history_2_of_l1 = model_2_of_l1.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)
In [94]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of_l1.history['loss'],'b',label='train')
plt.loglog(history_2_of_l1.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of_l1 = model_2_of_l1.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l1,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[94]:

$L_2$ Regularizer

In [80]:
kernel_weight = 0.003
bias_weight = 0.003

model_2_of_l2 = models.Sequential(name='noiseLorentzModel_l2')

# first hidden layer 
model_2_of_l2.add(layers.Dense(n_neurons+100, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2_of_l2.add(layers.Dense(n_neurons+100, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_2_of_l2.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.9) 
model_2_of_l2.compile(loss='MSE',optimizer=sgd) 
history_2_of_l2 = model_2_of_l2.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)
In [95]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of_l2.history['loss'],'b',label='train')
plt.loglog(history_2_of_l2.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of_l2 = model_2_of_l2.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l2,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[95]:

Underfitting through Regularization

In [83]:
kernel_weight = 1
bias_weight = 1

model_2_of_l2_uf = models.Sequential(name='noiseLorentzModel_l2_uf')

# first hidden layer 
model_2_of_l2_uf.add(layers.Dense(n_neurons+100, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2_of_l2_uf.add(layers.Dense(n_neurons+100, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_2_of_l2_uf.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.9, ) 
model_2_of_l2_uf.compile(loss='MSE',optimizer=sgd) 
history_2_of_l2_uf = model_2_of_l2_uf.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)
In [96]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_2_of_l2_uf.history['loss'],'b',label='train')
plt.loglog(history_2_of_l2_uf.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_2_of_l2_uf = model_2_of_l2_uf.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l2_uf,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[96]:
In [86]:
w_l1 = model_2_of_l2_uf.get_weights()
print(w_l1)
[array([[-2.88417112e-07,  1.27527655e-05, -1.16637480e-07,
         1.27158529e-07, -4.84074462e-07,  1.43536795e-06,
        -6.86162548e-06, -1.04252172e-06, -2.55289251e-06,
        -3.07931236e-06,  3.92774746e-06,  4.16653438e-06,
         1.18414084e-06, -2.58007901e-07, -2.13642478e+00,
         2.44120292e-06, -8.36712161e-07, -9.88112902e-07,
        -9.44330736e-07, -1.05971071e-06,  1.96455017e-06,
        -1.45048972e-07, -3.33212574e-06, -4.09947489e-07,
        -2.95707082e-06,  1.24698761e-06, -3.00997317e-06,
        -1.00888576e-06,  1.24241026e-07, -7.72184558e-06,
        -6.22474681e-06, -2.13643265e+00,  1.23318694e-06,
         2.50667017e-06,  1.11350982e-06, -1.76371100e-06,
        -2.08186165e-07,  8.18114495e-06, -8.95465348e-07,
        -2.61973163e-08,  5.01561495e-08, -3.92719448e-06,
         2.29828856e-07, -3.77010508e-07, -5.46358422e-07,
         2.74788044e-06,  2.31198260e-06,  1.33440528e-06,
         4.40988288e-07,  3.49153220e-06, -1.82652673e-06,
        -1.16432284e-06,  9.49320747e-06,  2.44958500e-07,
         6.31779358e-07, -1.73797844e-07, -3.16343380e-06,
         1.41480348e-06,  1.34047468e-07,  5.35134041e-06,
         1.45339445e-05, -2.37776976e-06, -1.65201016e-06,
        -1.77797267e-06, -1.07609947e-06, -2.80373615e-06,
         2.78306516e-06, -2.20638890e-06,  2.28984800e-06,
         8.36447100e-07,  5.35679101e-06, -3.97811118e-06,
         1.08231472e-06, -3.13787109e-06,  1.65237066e-06,
        -1.21595713e-06, -1.33467574e-06,  1.50074516e-06,
        -2.79652591e-06,  2.95822474e-07, -6.01960392e-06,
         1.74636375e-06,  1.83343036e-06,  6.28173609e-07,
        -1.95113967e-06, -1.28427788e-03, -3.50337518e-06,
        -4.88337264e-07,  5.53414850e-07,  1.44616006e-05,
         7.85187922e-07,  1.02689373e-05, -2.28825661e-06,
        -2.66807039e-08, -3.05666367e-06, -4.81291090e-06,
        -1.86799230e-06,  2.28896988e-06, -1.89967795e-06,
        -9.42158522e-07,  5.87020713e-06,  2.95284985e-06,
        -6.90401566e-06, -1.34891900e-06,  1.49266998e-06,
        -2.48025003e-06, -6.95027724e-09, -3.59443021e-07,
        -5.06257685e-08,  1.53962992e-06,  3.11969529e-06,
         1.86097225e-06,  1.11225427e-06, -9.04990543e-07,
        -4.07939842e-06, -6.20864625e-07, -1.69429518e-06,
         2.36994651e-06, -3.33540584e-07, -1.73009039e-06,
         3.42653379e-06, -3.57558179e-07, -6.63855474e-07,
        -1.17998559e-06,  2.31222032e-07,  1.74490893e-07,
        -1.98208249e-06, -8.47229160e-07,  3.86745069e-06,
         6.91940045e-07,  5.81900736e-07,  2.93065386e-06,
         3.74254967e-07,  3.76247864e+01,  2.02175806e-06,
         6.16663058e-08,  3.79456242e-06,  1.55372391e-06,
        -2.37112431e-07,  3.39812402e-07,  2.01163607e-06,
        -1.54178542e-06, -6.63706942e-07, -8.31436864e-06,
        -1.38172913e-06, -6.23475330e-07,  2.63843117e-06,
        -1.49550465e-06,  9.01490580e-07, -1.31065747e-06]], dtype=float32), array([ 3.7592869e-07, -1.6622209e-05,  1.5202757e-07, -1.6574100e-07,
        6.3095223e-07, -1.8708870e-06,  8.9435798e-06,  1.3588437e-06,
        3.3274912e-06,  4.0136370e-06, -5.1195061e-06, -5.4307443e-06,
       -1.5434324e-06,  3.3629269e-07,  2.2912655e+00, -3.1819125e-06,
        1.0905874e-06,  1.2879261e-06,  1.2308592e-06,  1.3812482e-06,
       -2.5606346e-06,  1.8905975e-07,  4.3431592e-06,  5.3433371e-07,
        3.8543044e-06, -1.6253483e-06,  3.9232591e-06,  1.3150022e-06,
       -1.6193825e-07,  1.0064806e-05,  8.1134604e-06,  2.2912762e+00,
       -1.6073602e-06, -3.2672438e-06, -1.4513711e-06,  2.2988550e-06,
        2.7135400e-07, -1.0663469e-05,  1.1671672e-06,  3.4146101e-08,
       -6.5374536e-08,  5.1187817e-06, -2.9956351e-07,  4.9140294e-07,
        7.1213458e-07, -3.5816427e-06, -3.0134845e-06, -1.7392907e-06,
       -5.7479281e-07, -4.5509350e-06,  2.3807311e-06,  1.5176017e-06,
       -1.2373635e-05, -3.1928386e-07, -8.2347390e-07,  2.2653154e-07,
        4.1232829e-06, -1.8440833e-06, -1.7472017e-07, -6.9750436e-06,
       -1.8943836e-05,  3.0992312e-06,  2.1532626e-06,  2.3174455e-06,
        1.4026092e-06,  3.6544457e-06, -3.6275023e-06,  2.8758509e-06,
       -2.9846337e-06, -1.0902418e-06, -6.9821485e-06,  5.1851503e-06,
       -1.4107104e-06,  4.0899627e-06, -2.1537328e-06,  1.5849027e-06,
        1.7396429e-06, -1.9561012e-06,  3.6450476e-06, -3.8558096e-07,
        7.8460716e-06, -2.2762449e-06, -2.3897296e-06, -8.1877420e-07,
        2.5431543e-06,  1.6739520e-03,  4.5663692e-06,  6.3650833e-07,
       -7.2133196e-07, -1.8849534e-05, -1.0234299e-06, -1.3384738e-05,
        2.9825590e-06,  3.4776157e-08,  3.9841166e-06,  6.2732429e-06,
        2.4347773e-06, -2.9834887e-06,  2.4760775e-06,  1.2280282e-06,
       -7.6513461e-06, -3.8488020e-06,  8.9988316e-06,  1.7582079e-06,
       -1.9455763e-06,  3.2328071e-06,  9.0591303e-09,  4.6850516e-07,
        6.5986647e-08, -2.0067844e-06, -4.0662717e-06, -2.4256281e-06,
       -1.4497344e-06,  1.1795827e-06,  5.3171707e-06,  8.0924724e-07,
        2.2083780e-06, -3.0890351e-06,  4.3474347e-07,  2.2550339e-06,
       -4.4662129e-06,  4.6604848e-07,  8.6528229e-07,  1.5380162e-06,
       -3.0137940e-07, -2.2743490e-07,  2.5834863e-06,  1.1042954e-06,
       -5.0409117e-06, -9.0188843e-07, -7.5846066e-07, -3.8198737e-06,
       -4.8781129e-07, -3.7685844e+01, -2.6351997e-06, -8.0377028e-08,
       -4.9459086e-06, -2.0251546e-06,  3.0905704e-07, -4.4291832e-07,
       -2.6220064e-06,  2.0095933e-06,  8.6508879e-07,  1.0837116e-05,
        1.8009732e-06,  8.1265017e-07, -3.4389827e-06,  1.9492704e-06,
       -1.1750211e-06,  1.7083370e-06], dtype=float32), array([[-4.10886258e-09,  5.44422241e-09,  1.12696563e-08, ...,
         1.01308084e-09, -2.68624301e-09,  5.22214716e-10],
       [ 1.81679098e-07, -2.40723665e-07, -4.98303621e-07, ...,
        -4.47947635e-08,  1.18775944e-07, -2.30904611e-08],
       [-1.66164627e-09,  2.20167062e-09,  4.55750637e-09, ...,
         4.09694917e-10, -1.08633103e-09,  2.11186527e-10],
       ...,
       [-2.13053113e-08,  2.82294366e-08,  5.84355710e-08, ...,
         5.25303667e-09, -1.39287302e-08,  2.70779421e-09],
       [ 1.28428628e-08, -1.70167311e-08, -3.52249856e-08, ...,
        -3.16653326e-09,  8.39625081e-09, -1.63226066e-09],
       [-1.86719529e-08,  2.47402276e-08,  5.12128402e-08, ...,
         4.60375515e-09, -1.22071242e-08,  2.37310638e-09]], dtype=float32), array([ 6.9263903e-04, -9.1537647e-04, -1.8582679e-03, -8.6006563e-05,
       -1.8961675e-04, -6.3404790e-04,  7.1726227e-04,  1.4960077e-03,
       -1.9075507e-03, -1.0198024e-03, -5.8383564e-04, -7.7785971e-04,
        9.9176634e-04,  4.9567968e-04, -1.3956660e-03, -9.3661644e-04,
        6.8847183e-04, -8.4538641e-04, -8.4630301e-05,  3.3667600e-03,
        1.1513699e-03,  7.3868083e-04, -4.8228004e-04,  3.8174330e-04,
        6.6337176e-04,  8.4649015e-05, -8.9763314e-05,  4.1558500e-04,
        2.8803863e-04,  1.4788480e-03,  1.4005979e-03,  1.9555381e-03,
       -8.4131747e-04,  7.6824636e-04, -8.0400496e-04, -1.7430540e-03,
        1.0631359e-03,  1.3165418e-03, -7.8513753e-04, -2.4282478e-04,
       -1.8962333e-04, -8.7219966e-04, -4.6748249e-04,  2.4226133e-04,
        1.7403658e-03, -2.0569918e-04, -2.9222993e-04, -1.5053046e-03,
        2.1739944e-04,  2.3497007e-04, -2.9648544e-04, -6.9273449e-04,
        7.4854237e-04,  7.7706459e-04,  4.6976656e-03, -3.3625832e-04,
       -1.1255499e-03,  8.9332275e-04,  4.9856654e-04,  5.5233564e-04,
       -3.2161933e-04, -5.1686913e-04, -6.3532940e-04, -2.9614073e-04,
       -9.3045761e-05,  6.2411098e-05,  3.5030756e-04,  5.4064149e-04,
       -1.4655702e-04, -1.7123058e-04, -3.6407053e-04,  8.8821677e-04,
        2.1117995e-04,  2.3532979e-04,  8.7957387e-04, -4.6700588e-04,
       -1.7037743e-04, -5.2308675e-04, -3.7289131e-04,  1.6234192e-04,
        3.4899553e-03, -3.5835779e-04, -2.1936349e-04,  1.3014348e-04,
        1.2250510e-03, -1.6847672e-04, -5.9803110e-04, -4.9893497e-05,
       -1.0368922e-03, -4.0937553e-04, -9.9769700e-04,  5.7169644e-04,
       -7.9602702e-04, -9.8119001e-04, -3.8751878e-04,  2.2816714e-03,
        3.1650392e-04, -8.9043914e-04,  3.7664629e-04, -6.5263448e-05,
       -6.8433350e-04, -1.9627070e-04, -3.5797816e-04,  2.1901599e-04,
        1.7351424e-02, -1.0822411e-03,  2.6112271e-04,  6.9150538e-04,
       -6.4443261e-04,  6.1023631e-04,  1.8032314e-03,  4.3882115e-04,
       -1.0913245e-03,  6.5822940e-05,  9.7107654e-04, -2.3037521e-03,
        9.0284692e-04,  7.2213076e-04, -8.5996324e-04,  2.6774083e-05,
        7.0166495e-04,  2.2808732e-03, -2.1146849e-04, -1.4010730e-04,
        3.8797373e-04,  5.9421221e-04,  1.8837927e-03,  3.9837512e-05,
       -6.1478605e-04,  9.8467595e-04,  2.6815565e-04,  4.2062858e-04,
       -8.0611953e-04,  3.6163151e-04,  6.7630230e-05, -5.8004342e-04,
        1.2767658e-04,  9.3256589e-04, -1.4607129e-03, -1.1284142e-03,
       -2.3228749e-05, -6.9869775e-04, -4.3664465e-04,  9.5241633e-04,
        5.8255694e-04, -7.3297400e-05, -1.3849903e-03, -1.7132581e-04,
        4.5371219e-04, -8.8329078e-05], dtype=float32), array([[-0.22093561],
       [ 0.29252952],
       [ 0.60225844],
       [ 0.02736716],
       [ 0.06034451],
       [ 0.2021632 ],
       [-0.2288298 ],
       [-0.48164615],
       [ 0.6188689 ],
       [ 0.3262449 ],
       [ 0.18609525],
       [ 0.24827772],
       [-0.31718072],
       [-0.15791807],
       [ 0.44864646],
       [ 0.29937983],
       [-0.2196003 ],
       [ 0.2699878 ],
       [ 0.02692894],
       [-1.1446787 ],
       [-0.36888862],
       [-0.23570329],
       [ 0.15363969],
       [-0.12155622],
       [-0.21155523],
       [-0.02693519],
       [ 0.02856238],
       [-0.13235079],
       [-0.09168809],
       [-0.47599408],
       [-0.45026237],
       [-0.6350904 ],
       [ 0.26867983],
       [-0.24519327],
       [ 0.2566829 ],
       [ 0.5636329 ],
       [-0.34026867],
       [-0.4227223 ],
       [ 0.2506168 ],
       [ 0.07728675],
       [ 0.06034685],
       [ 0.2786174 ],
       [ 0.14891388],
       [-0.07710703],
       [-0.5627314 ],
       [ 0.0654643 ],
       [ 0.09302402],
       [ 0.48471424],
       [-0.06918968],
       [-0.07478518],
       [ 0.09437988],
       [ 0.2209659 ],
       [-0.23886694],
       [-0.24802363],
       [-1.7416492 ],
       [ 0.10705478],
       [ 0.36050543],
       [-0.28542224],
       [-0.15884045],
       [-0.17602113],
       [ 0.10238892],
       [ 0.16468497],
       [ 0.20257276],
       [ 0.09427051],
       [ 0.02960727],
       [-0.01985889],
       [-0.11153305],
       [-0.1722837 ],
       [ 0.04663718],
       [ 0.05449111],
       [ 0.11591919],
       [-0.28377602],
       [-0.06720978],
       [-0.07489955],
       [-0.28099436],
       [ 0.14876083],
       [ 0.0542204 ],
       [ 0.16667284],
       [ 0.11873418],
       [-0.05166222],
       [-1.1933751 ],
       [ 0.11409934],
       [ 0.06981548],
       [-0.04141351],
       [-0.39285943],
       [ 0.05361446],
       [ 0.1906363 ],
       [ 0.0158755 ],
       [ 0.33177376],
       [ 0.13036917],
       [ 0.31909755],
       [-0.18221223],
       [ 0.2541176 ],
       [ 0.31376725],
       [ 0.12339851],
       [-0.7468232 ],
       [-0.10075898],
       [ 0.28449127],
       [-0.11993048],
       [ 0.0207664 ],
       [ 0.21827388],
       [ 0.06246376],
       [ 0.11397963],
       [-0.06970492],
       [-5.026797  ],
       [ 0.3464622 ],
       [-0.08311414],
       [-0.22057138],
       [ 0.20549016],
       [-0.19454165],
       [-0.5837827 ],
       [-0.13976452],
       [ 0.3494015 ],
       [-0.02094436],
       [-0.3104995 ],
       [ 0.7544901 ],
       [-0.2884907 ],
       [-0.230394  ],
       [ 0.2746778 ],
       [-0.00851912],
       [-0.22382897],
       [-0.74654615],
       [ 0.06730189],
       [ 0.04458465],
       [-0.12354336],
       [-0.18941501],
       [-0.61085904],
       [-0.01267572],
       [ 0.19599682],
       [-0.31489128],
       [-0.0853548 ],
       [-0.13396068],
       [ 0.25735834],
       [-0.11514252],
       [-0.02151958],
       [ 0.18488023],
       [-0.04062889],
       [-0.29807293],
       [ 0.4700226 ],
       [ 0.36143357],
       [ 0.00739107],
       [ 0.22287814],
       [ 0.13906974],
       [-0.30447423],
       [-0.18568598],
       [ 0.02332265],
       [ 0.44514015],
       [ 0.05452228],
       [-0.14451884],
       [ 0.02810591]], dtype=float32), array([0.86910945], dtype=float32)]

Weight decay & clipping

In [88]:
kernel_weight = 0.003
bias_weight = 0.003

model_2_wd = models.Sequential(name='noiseLorentzModel_sgd_weightdecay')

# first hidden layer 
model_2_wd.add(layers.Dense(n_neurons+100, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_2_wd.add(layers.Dense(n_neurons+100, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_2_wd.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.9, clipvalue=0.5, decay=1e-3)
model_2_wd.compile(loss='MSE',optimizer=sgd) 
history_2_wd = model_2_wd.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=5000, batch_size=32, verbose=0)
In [98]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_2_of_l2.history['loss'],'b',label='train')
plt.loglog(history_2_of_l2.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
e_hat_2_of_l2 = model_2_of_l2.predict(w2)

plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_of_l2,'g', label='FFNN' , linewidth=2,)

plt.legend()


plt.subplot(2,2,3)
plt.loglog(history_2_wd.history['loss'],'b',label='train')
plt.loglog(history_2_wd.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (Weight decay)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_2_wd = model_2_wd.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_2_wd,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()

Explore the SGD solver

Exploration

  • Learning rate
  • Momentum
  • Number of minibatches
In [104]:
kernel_weight = 0.003
bias_weight = 0.003

model_3 = models.Sequential(name='noiseLorentzModel_sgd')

# first hidden layer 
model_3.add(layers.Dense(n_neurons, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_3.add(layers.Dense(n_neurons, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_3.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.9)
model_3.compile(loss='MSE',optimizer=sgd) 
history_3 = model_3.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)
In [105]:
plt.figure(figsize=[15,6])
plt.subplot(1,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(1,2,2)
e_hat_3 = model_3.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2,)

plt.legend()
Out[105]:

Learning Rate

Too large learning rate

In [106]:
kernel_weight = 0.003
bias_weight = 0.003

model_3_lrL = models.Sequential(name='noiseLorentzModel_sgd_learningRate_Large')

# first hidden layer 
model_3_lrL.add(layers.Dense(n_neurons, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_3_lrL.add(layers.Dense(n_neurons, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_3_lrL.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.07,momentum=0.9)
model_3_lrL.compile(loss='MSE',optimizer=sgd) 
history_3_lrL = model_3_lrL.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)
In [115]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.01)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2,)

plt.legend()

plt.subplot(2,2,3)
plt.loglog(history_3_lrL.history['loss'],'b',label='train')
plt.loglog(history_3_lrL.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.07)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_3_lrL = model_3_lrL.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3_lrL,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()

Too small learning rate

In [108]:
kernel_weight = 0.003
bias_weight = 0.003

model_3_lrS = models.Sequential(name='noiseLorentzModel_sgd_learningRate_Small')

# first hidden layer 
model_3_lrS.add(layers.Dense(n_neurons, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_3_lrS.add(layers.Dense(n_neurons, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_3_lrS.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.002,momentum=0.9)
model_3_lrS.compile(loss='MSE',optimizer=sgd) 
history_3_lrS = model_3_lrS.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)
In [116]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.01)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2,)

plt.legend()

plt.subplot(2,2,3)
plt.loglog(history_3_lrS.history['loss'],'b',label='train')
plt.loglog(history_3_lrS.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (lr 0.005)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_3_lrS = model_3_lrS.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3_lrS,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()

Momentum

In [110]:
kernel_weight = 0.003
bias_weight = 0.003

model_3_mn = models.Sequential(name='noiseLorentzModel_sgd_momentum')

# first hidden layer 
model_3_mn.add(layers.Dense(n_neurons, activation='tanh', input_shape=(1,)))
# second hidden layer 
model_3_mn.add(layers.Dense(n_neurons, activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(kernel_weight), 
                                    bias_regularizer=tf.keras.regularizers.l2(bias_weight) ))
# output layer, one neuron 
model_3_mn.add(layers.Dense(1,  activation='linear'))

sgd = optimizers.SGD(lr=0.01,momentum=0.5)
model_3_mn.compile(loss='MSE',optimizer=sgd) 
history_3_mn = model_3_mn.fit(w2, e2, validation_data=(wTest2,eTest2), epochs=2000, batch_size=32, verbose=0)
In [117]:
plt.figure(figsize=[15,6])
plt.subplot(2,2,1)
plt.loglog(history_3.history['loss'],'b',label='train')
plt.loglog(history_3.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (Momentum = 0.9)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,2)
# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()

## 

plt.subplot(2,2,3)
plt.loglog(history_3_mn.history['loss'],'b',label='train')
plt.loglog(history_3_mn.history['val_loss'],'r', label='val')
plt.legend(loc='upper right')

plt.title('Model loss (Momentum = 0.5)')
plt.ylabel('Loss')
plt.xlabel('Epoch')

plt.subplot(2,2,4)
e_hat_3_mn = model_3_mn.predict(w2)

# plot the prediction and the ground truth
plt.plot(w2,e2,'ob',label='Train')
plt.plot(wTest2,eTest2,'or',label='Test')
plt.plot(w2,e2_clean,'-k',label='Real', linewidth=2)
plt.plot(w2, e_hat_3_mn,'g', label='FFNN' , linewidth=2)

plt.legend()
plt.tight_layout()