Key Word(s): Regularization, Neural Networks, Data Augmentation, Weight Decay, Dropout

Title¶

Exercise: Early Stopping

Description¶

The goal of this exercise is to understand early stopping. Early stopping is a method of avoiding overfitting, not exactly regularizing.

NOTE: This graph is only a sample.

Instructions:¶

Generate the predictor and response data using the helper code given.
Split the data into train and test sets.
Visualise the split data using the helper code.
Build a simple neural network with 5 hidden layers with 100 neurons each with the given pre-trained weights. This network has no regularization.
Compile the model with MSE as the loss.
Fit the model on the training data and save the history.
Use the helper code to visualise the MSE of the train and test data with respect to the epochs.
Predict on the entire data.
Use the helper function to plot the predictions along with the generated data.
Repeat steps 4 to 8 by building the same neural network with early stopping.
The last plot will consist of the predictions of both the neural networks. The graph will look similar to the one given above.

Hints:¶

Use the Dense layer to regularize using l2 and l1 regularization. More details can be found here.

tf.keras.sequential() : A sequential model is for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

tf.keras.optimizers() : An optimizer is one of the two arguments required for compiling a Keras model

model.add() : Adds layers to the model.

model.compile() : Compiles the layers defined into a neural network

model.fit() : Fits the data to the neural network

model.predict() : Used to predict the values given the model

history() : The history object is returned from calls to the fit() function used to train the model. Metrics are stored in a dictionary in the history member of the object returned.

tf.keras.regularizers.L2() : A regularizer that applies a L2 regularization penalty.

In [1]:

# Import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
np.random.seed(0)
tf.random.set_seed(0)
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow.keras.models import load_model
from tensorflow.keras import regularizers
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import load_model
from sklearn.model_selection import train_test_split
%matplotlib inline

In [2]:

# Use the helper code below to generate the data

# Defines the number of data points to generate
num_points = 30 

# Generate predictor points (x) between 0 and 5
x = np.linspace(0,5,num_points)

# Generate the response variable (y) using the predictor points
y = x * np.sin(x) + np.random.normal(loc=0, scale=1, size=num_points)

# Generate data of the true function y = x*sin(x) 
# x_b will be used for all predictions below 
x_b = np.linspace(0,5,100)
y_b = x_b*np.sin(x_b)

In [3]:

# Split the data into train and test sets with .33 and random_state = 42
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [4]:

# Helper code to plot the generated data 

# Plot the train data
plt.rcParams["figure.figsize"] = (10,8)

plt.plot(x_train,y_train, '.', label='Train data', markersize=15, color='#FF9A98')

# Plot the test data
plt.plot(x_test,y_test, '.', label='Test data', markersize=15, color='#75B594')

# Plot the true data
plt.plot(x_b, y_b, '-', label='True function', linewidth=3, color='#5E5E5E')

# Set the axes labels
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

Begin with an unregularized NN.¶

Same as the previous exercise¶

In [5]:

# Building an unregularized NN. 
# Initialise the NN, give it an appropriate name for the ease of reading
# The FCNN has 5 layers, each with 100 nodes
model_1 = models.Sequential(name='Unregularized')

# Add 5 hidden layers with 100 neurons each
model_1.add(layers.Dense(100,  activation='tanh', input_shape=(1,)))
model_1.add(layers.Dense(100,  activation='relu'))
model_1.add(layers.Dense(100,  activation='relu'))
model_1.add(layers.Dense(100,  activation='relu'))
model_1.add(layers.Dense(100,  activation='relu'))

# Add the output layer with one neuron 
model_1.add(layers.Dense(1,  activation='linear'))

# View the model summary
model_1.summary()

In [7]:

# Load with the weights already provided for the unregularized network

model_1.load_weights('weights.h5')

# Compile the model
model_1.compile(loss='MSE',optimizer=optimizers.Adam(learning_rate=0.001))

In [8]:

# Use the model above to predict for x_b (used exclusively for plotting) 
y_pred = model_1.predict(x_b)

# Use the model above to predict on the test data
y_pred_test = model_1.predict(x_test)

# Compute the MSE on the test data
mse = mean_squared_error(y_test,y_pred_test)

In [11]:

# Use the helper code to plot the predicted data
plt.rcParams["figure.figsize"] = (10,8)
plt.plot(x_b, y_pred, label = 'Unregularized model', color='#5E5E5E', linewidth=3)
plt.plot(x_train,y_train, '.', label='Train data', markersize=15, color='#FF9A98')
plt.plot(x_test,y_test, '.', label='Test data', markersize=15, color='#75B594')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

Implement previous NN with early stopping¶

For early stopping we build the same network but then we implement early stopping using callbacks.

In [ ]:

# Building an unregularized NN with early stopping. 
# Initialise the NN, give it an appropriate name for the ease of reading
# The FCNN has 5 layers, each with 100 nodes
model_2 = models.Sequential(name='EarlyStopping')

# Add 5 hidden layers with 100 neurons each 
# tanh is the activation for the first layer
# relu is the activation for all other layers
model_2.add(layers.Dense(100,  activation='tanh', input_shape=(1,)))
model_2.add(layers.Dense(100,  activation='relu'))
model_2.add(layers.Dense(100,  activation='relu'))
model_2.add(layers.Dense(100,  activation='relu'))
model_2.add(layers.Dense(100,  activation='relu'))

# Add the output layer with one neuron 
model_2.add(layers.Dense(1,  activation='linear'))

# View the model summary
model_2.summary()

In [ ]:

# Use the keras early stopping callback with patience=10 while monitoring the loss
callback = ___

# Compile the model with MSE as loss and Adam optimizer with learning rate as 0.001
___

# Save the history about the model after fitting on the train data
# Use 0.2 validation split with 1500 epochs and batch size of 10
# Use the callback for early stopping here
history_2 = ___

In [ ]:

# Helper function to plot the data
# Plot the MSE of the model
plt.rcParams["figure.figsize"] = (10,8)
plt.title("Early stop model")
plt.semilogy(history_2.history['loss'], label='Train Loss', color='#FF9A98', linewidth=2)
plt.semilogy(history_2.history['val_loss'],  label='Validation Loss', color='#75B594', linewidth=2)
plt.legend()

# Set the axes labels
plt.xlabel('Epochs')
plt.ylabel('Log MSE Loss')
plt.legend()
plt.show()

In [ ]:

# Use the early stop implemented model above to predict for x_b (used exclusively for plotting)
y_early_stop_pred = ___

# Use the model above to predict on the test data
y_earl_stop_pred_test = ___

# Compute the test MSE by predicting on the test data
mse_es = ___

In [ ]:

# Use the helper code to plot the predicted data

# Plotting the predicted data using the L2 regularized model
plt.rcParams["figure.figsize"] = (10,8)
plt.plot(x_b, y_early_stop_pred, label='Early stop regularized model', color='black', linewidth=2)

# Plotting the predicted data using the unregularized model
plt.plot(x_b, y_pred, label = 'Unregularized model', color='#005493', linewidth=2)

# Plotting the training data
plt.plot(x_train,y_train, '.', label='Train data', markersize=15, color='#FF9A98')

# Plotting the testing data
plt.plot(x_test,y_test, '.', label='Test data', markersize=15, color='#75B594')

# Set the axes labels
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

Mindchow 🍲¶

After marking change the patience parameter once to 2 and once to 100 in the early stopping callback with the same data. Do you notice any change? While value is more efficient?