Key Word(s): Gradient Descent, Stochastic Gradient Descent, Back Propagation, Optimizers
Instructions:¶
- Get the predictor and response variables from the file backprop.csv and assign them to variables
x
andy
. - Build a forward pass of the above neural network with one hidden layer. You will build this neural network using numpy (no deep learning package allowed).
- Initialize the weights randomly with the random seed as 310, and make a prediction.
- Plot your neural net predictions with the true value.
- Compute the mean_squared_error of your predictions with the actual values.
- Find the derivative of the loss function with respect to $w_1$.
- Find the derivative of the loss function with respect to $w_2$.
- Use the derivatives to update $w_1$ and $w_2$ .
- Use the updated weights to make a forward pass and compute new predictions.
- Plot the new predictions with the actual data. This will look similar to the one given above.
- Calculate your $MSE$ and compare with the earlier value.
Hints:¶
Loss function:¶
$$L\ =\ \frac{1}{n}\sum_1^n\left(y_{pred}-y_{true}\right)^2$$Activation function:¶
$$f\left(x\right)=\sin x$$ax.plot() : A scatter plot of y vs. x with varying marker size and/or colour.
np.exp() : Calculates the exponential of all elements in the input array.
plt.xlabel() : This is used to specify the text to be displayed as the label for the x-axis.
plt.ylabel() : This is used to specify the text to be displayed as the label for the y-axis.
Note: This exercise is auto-graded and you can try multiple attempts.
# import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
%matplotlib inline
# get the data from the file `backprop.csv`
df = pd.read_csv('backprop.csv')
# The input needs to be a 2D array so since we have a single
# column (1000,) we need to reshape to a 2D array (1000,1)
x = df.x.values.reshape(-1,1)
y = df.y.values
# Designing the simple neural network
def neural_network(W, x):
# W is a list of the two weights (w1,w2) of your neural network
# x is the input to the neural network
'''
Compute a1, a2, and y
a1 is the dot product of the input and weight
To compute a2, first use the activation function on a1, then multiply by w2
Finally, use the activation function on a2 to compute y
Return all three values which you will use to compute derivatives later
'''
a1 = np.dot(___, ___)
fa1 = np.sin(__)
a2 = np.dot(___,___)
y = np.sin(__)
return a1,a2,y
# Initialize the weights, but keep the random seed as 310 for reproducable results
np.random.seed(310)
W = [np.random.randn(1, 1), np.random.randn(1, 1)]
# Plot the predictor and response variables
fig,ax = plt.subplots(1,1,figsize=(8,6))
# plot the true x and y values
ax.plot(x,y,label = 'True function',color='darkblue',linewidth=2)
# plot the x values with the network predictions
ax.plot(x,neural_network(W,x)[2],label = 'Neural net predictions',color='#9FC131FF',linewidth=2)
# Set the x and y labels
ax.set_xlabel('$x$',fontsize=14)
ax.set_ylabel('$y$',fontsize=14)
ax.legend(fontsize=14);
### edTest(test_nn_mse) ###
# You can use the mean_squared_error function to find the MSE of your predictions with true function values
y_pred = ___
mse = mean_squared_error(y, y_pred)
print(f'The MSE of the neural network predictions wrt true function is {mse:.2f}')
Single update¶
# Here we will update the weights only once
# Get the predicted response, and the two a's of the network
a1,a2,y_pred = neural_network(W,x)
# Compute the gradient of the loss function with respect to weight 2
# Use pen and paper to calculate these derivatives before coding them
dldw2 = ___
# Now compute the gradient of the loss function with respect to weight 1
dldw1 = ___
# combine the two in a list
dldw = [np.mean(dldw1),np.mean(dldw2)]
# In the update step, make sure to update the weights with their gradients
Wnew = [i - j for i,j in zip(W,dldw)]
# Plot the predictor and response variables
fig,ax = plt.subplots(1,1,figsize=(8,6))
ax.plot(x,y,label = 'True function',color='darkblue',linewidth=2)
ax.plot(x,neural_network(Wnew,x)[2],label = 'Neural net predictions',color='#9FC131FF',linewidth=2)
ax.set_xlabel('$x$',fontsize=14)
ax.set_ylabel('$y$',fontsize=14)
ax.legend(fontsize=14);
### edTest(test_one_update_mse) ###
# Compute the new MSE after one update and print it
y_pred = ___
mse_update = ___
print(f'The MSE of the new neural network predictions with true function is {mse_update:.2f} as compared to {mse:.2f} from before ')
Several updates¶
In principle, only a single update will never be sufficient to improve model predictions. In the below segment, use the method from above, and update the weight 300 times before plotting predictions.
Does your MSE decrease?
# Reinitialize the weights to start again
np.random.seed(310)
W = [np.random.randn(1, 1), np.random.randn(1, 1)]
# Unlike the previous step, this time we will set a learning rate of 0.01 to avoid drastic updates and run the above code for 10000 loops
lmb = 0.01
for i in range(300):
a1,a2,y_pred = ___
# Remember to use np.mean
dldw2 = ___
dldw1 = ___
W[0] = W[0] - lmb * dldw1
W[1] = W[1] - lmb * dldw2
# Plot your results and calculate the MSE
# Plot the predictor and response variables
fig,ax = plt.subplots(1,1,figsize=(8,6))
ax.plot(x,y,label = 'True function',color='darkblue',linewidth=2)
ax.plot(x,neural_network(W,x)[2],label = 'Neural net predictions',color='#9FC131FF',linewidth=2)
ax.set_xlabel('$x$',fontsize=14)
ax.set_ylabel('$y$',fontsize=14)
ax.legend(fontsize=14);
### edTest(test_mse) ###
# We again compute the MSE and compare it with the original predictions
y_pred = ___
mse_final = mean_squared_error(y,y_pred)
print(f'The final MSE is {mse_final:.2f} as compared to {mse:.2f} from before ')
Mindchow 🍲¶
If you notice, your predicted values are off by approximately 0.5, from the actual values.
After marking, go back to your neural network and add a bias correction to your predictions of 0.5.
i.e y = np.sin(a2) + 0.5
and rerun your code.
Does your code fit better? And does your $MSE$ reduce?