{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: Back-propagation by chain rule**\n",
"\n",
"# Description\n",
"\n",
"The aim of this exercise is to understand how the chain rule works. We will continue using the simple neural network from the previous exercise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions:\n",
"\n",
"- Get the response and predictor variables from the backprop.csv file.\n",
"- Visualise the data generated.\n",
"- For the given simple neural network, write a function that computes the gradient of the loss function with respect to the weights.\n",
"- To do this, compute the partial derivatives using individual functions. Refer to the instructions in the scaffold.\n",
"\n",
"# Hints:\n",
"\n",
"The partial derivative of the loss function $L$ wrt $w_2$ and $w_1$ can be expressed as:\n",
"\n",
"$$\\frac{\\partial L}{\\partial w_2}\\ =\\ \\frac{\\partial L}{\\partial y}\\ \\frac{\\partial y}{\\partial a_2}\\frac{\\partial a_2}{\\partial w_2}$$\n",
"\n",
"$$\\frac{\\partial L}{\\partial w_1}\\ =\\ \\frac{\\partial L}{\\partial y}\\ \\frac{\\partial y}{\\partial a_2}\\frac{\\partial a_2}{\\partial h_1}\\ \\frac{\\partial h_1}{\\partial a_1}\\frac{\\partial a_1}{\\partial w_1}$$\n",
"\n",
"\n",
"np.cos() : Returns cosine element-wise\n",
"\n",
"np.sin() : Returns sine element-wise\n",
"\n",
"v : Calculates the exponential of all elements in the input array.\n",
"\n",
"NOTE - In this exercise, we expect you to take out a piece of paper an do the backpropagation using chain rule by hand."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Import the necessary libraries\n",
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"# Read the file 'backprop.csv'\n",
"\n",
"df = pd.read_csv('backprop.csv')"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"#Generating the predictor and response data \n",
"x = df.x.values.reshape(-1,1)\n",
"y = df.y.values"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the weights and use the same random seed as the previous exercise i.e. 310\n",
"np.random.seed(310)\n",
"W = [np.random.randn(1, 1), np.random.randn(1, 1)]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the activation function\n",
"def A(x):\n",
" return ___\n",
"\n",
"# Function to compute the derivative of the activation function\n",
"def der_A(x):\n",
" return ___"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"# Defining a simple neural network we used in the previous exercise\n",
"\n",
"def neural_network(W, x):\n",
" \n",
" # Computing the first affine\n",
" a1 = np.dot(x, W[0])\n",
" \n",
" # Defining sin() as the activation function\n",
" fa1 = A(a1)\n",
" \n",
" # Computing the second affine\n",
" a2 = np.dot(fa1,W[1])\n",
" \n",
" # Defining sin() as the activation function\n",
" y = A(a2)\n",
" \n",
" return a1,a2,y"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"#Use the helper code below to plot the true data and the predictions of your neural network\n",
"\n",
"fig,ax = plt.subplots(1,1,figsize=(8,6))\n",
"ax.plot(x,y,label = 'True Function',color='darkblue',linewidth=2)\n",
"ax.plot(x,neural_network(W,x)[2],label = 'Neural Network Predictions',color='#9FC131FF',linewidth=2)\n",
"ax.set_xlabel('$x$',fontsize=14)\n",
"ax.set_ylabel('$y$',fontsize=14)\n",
"ax.legend(fontsize=14, loc='best');"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the partial derivate of a (particular neuron) with respect to corresponding weight w\n",
"\n",
"def dadw(x,firstweight=0):\n",
" '''\n",
" The derivative of the 'a' wrt the preceding weight is just the activation of the previous neuron\n",
" Note, account for the case where the input layer has no activation layers associated with it. i.e return x if its the first weight \n",
" '''\n",
" if firstweight == 1:\n",
" return ___\n",
" return ___"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the partial derivate of h with respect to a\n",
"\n",
"def dhda(a):\n",
" '''\n",
" This is the derivative of the output of the activation function wrt the affine transformation.\n",
" Return the derivative of the activation of the affine transformation\n",
" '''\n",
" \n",
" return ___"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the partial derivate of y with respect to a\n",
"\n",
"def dyda(a):\n",
" '''\n",
" This is the derivative of the output of the neural network wrt the affine transformation.\n",
" Return the derivative of the activation of the affine transformation\n",
" '''\n",
" \n",
" return ___"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the partial derivate of a with respect to h\n",
"def dadh(w):\n",
" \n",
" return ___"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the partial derivate of loss with respect to y\n",
"def dldy(y_pred,y):\n",
" '''\n",
" Since our loss function is the MSE,\n",
" The partial derivative of L wrt y will be 2*(y_pred - y), for all predictions and response\n",
" '''\n",
" \n",
" return ___"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"# Function to compute the partial derivate of loss with respect to w\n",
"\n",
"def dldw(W,x):\n",
" \n",
" '''\n",
" Now, combine the functions from above and find the derivative wrt weights.\n",
" These will be for all the points, hence take a mean of all values for each partial derivative and return as a list of 2 values\n",
" \n",
" '''\n",
" dldw2 = ____\n",
" dldw1 = ____\n",
" \n",
" return [np.mean(dldw2),np.mean(dldw1)]"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_gradient) ###\n",
"\n",
"# Get the predicted response, and the two activations of the network\n",
"a1, a2, y_pred = neural_network(W,x)\n",
"\n",
"# Compute the gradient of the loss function with respect to the weights using function defined above\n",
"gradW = dldw(W,x)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print the list of your gradients below\n",
"print(f'The derivatives of w1 w2 wrt L are {gradW}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mindchow 🍲\n",
"\n",
"1. Compare your computed partial derivatives wrt the previous exercise. Are they the same?\n",
"\n",
"2. This example was just for a simple case of 1 neuron in 1 hidden layer. How could we generalize this idea to compute partial derivatives of all the weights?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}