{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: B.1 - MLP by Hand**\n",
"\n",
"# Description\n",
"\n",
"In this exercise, we will **construct a neural network** to classify 3 species of iris. The classification is based on 4 measurement predictor variables: sepal length & width, and petal length & width in the given dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions:\n",
"The Neural Network will be built from scratch using pre-trained weights and biases. Hence, we will only be doing the forward (i.e., prediction) pass. \n",
"\n",
"- Load the iris dataset from sklearn standard datasets.\n",
"- Assign the predictor and response variables appropriately.\n",
"- One hot encode the categorical labels of the predictor variable.\n",
"- Load and inspect the pre-trained weights and biases.\n",
"- Construct the MLP:\n",
" - Augment X with a column of ones to create the augmented design matrix X \n",
" - Create the first layer weight matrix by vertically stacking the bias vector on top of the weight vector\n",
" - Perform the affine transformation \n",
" - Activate the output of the affine transformation using ReLU \n",
" - Repeat the first 3 steps for the hidden layer (augment, vertical stack, affine)\n",
" - Use softmax on the final layer\n",
" - Finally, predict y \n",
" \n",
"# Hints:\n",
"This will further develop our intuition for the architecture of a deep neural network. This diagram shows the structure of our network. You may find it useful to refer to it during the exercise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is our first encounter with a multi-class classification problem and also the softmax activation on the output layer. Note: $f_1()$ above is the ReLU activation and $f_2()$ is the softmax.\n",
"\n",
"to_categorical(y, num_classes=None, dtype='float32') : Converts a class vector (integers) to the binary class matrix.\n",
"\n",
"np.vstack(tup) : Stack arrays in sequence vertically (row-wise).\n",
"\n",
"numpy.dot(a, b, out=None) : Returns the dot product of two arrays.\n",
"\n",
"numpy.argmax(a, axis=None, out=None) : Returns the indices of the maximum values along an axis.\n",
"\n",
"Note: This exercise is **auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#Import library\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import tensorflow as tf\n",
"from sklearn.datasets import load_iris\n",
"from tensorflow.keras.utils import to_categorical\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X shape: (150, 4)\n",
"y shape: (150,)\n"
]
}
],
"source": [
"#Load the iris data\n",
"iris_data = load_iris()\n",
"\n",
"#Get the predictor and reponse variables\n",
"X = iris_data.data\n",
"y = iris_data.target\n",
"\n",
"#See the shape of the data\n",
"print(f'X shape: {X.shape}')\n",
"print(f'y shape: {y.shape}')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Y shape: (150, 3)\n"
]
}
],
"source": [
"#One-hot encode target labels\n",
"Y = to_categorical(y)\n",
"print(f'Y shape: {Y.shape}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load and inspect the pre-trained weights and biases. Compare their shapes to the NN diagram."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"#Load and inspect the pre-trained weights and biases\n",
"weights = np.load('data/weights.npy', allow_pickle=True)\n",
"\n",
"# weights for hidden (1st) layer\n",
"w1 = weights[0] \n",
"\n",
"# biases for hidden (1st) layer\n",
"b1 = weights[1]\n",
"\n",
"# weights for output (2nd) layer\n",
"w2 = weights[2]\n",
"\n",
"#biases for output (2nd) layer\n",
"b2 = weights[3] "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"w1 - shape: (4, 3)\n",
"[[-0.42714605 -0.72814226 0.37730372]\n",
" [ 0.39002347 -0.73936987 0.7850246 ]\n",
" [ 0.12336338 -0.7267647 -0.48210236]\n",
" [ 0.20957732 -0.7505736 -1.3789996 ]]\n",
"\n",
"b1 - shape: (3,)\n",
"[0. 0. 0.31270522]\n",
"\n",
"w2 - shape: (3, 3)\n",
"[[ 0.7043929 0.13273811 -0.845736 ]\n",
" [-0.8318007 -0.6977086 0.75894 ]\n",
" [ 1.1978723 0.14868832 -0.473792 ]]\n",
"\n",
"b2 - shape: (3,)\n",
"[-1.2774311 0.45491916 0.73040146]\n",
"\n"
]
}
],
"source": [
"#Compare their shapes to that in the NN diagram.\n",
"for arr, name in zip([w1,b1,w2,b2], ['w1','b1','w2','b2']):\n",
" print(f'{name} - shape: {arr.shape}')\n",
" print(arr)\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the first affine transformation we need to multiple the augmented input by the first weight matrix (i.e., layer).\n",
"\n",
"$$\n",
"\\begin{bmatrix}\n",
"1 & X_{11} & X_{12} & X_{13} & X_{14}\\\\\n",
"1 & X_{21} & X_{22} & X_{23} & X_{24}\\\\\n",
"\\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
"1 & X_{n1} & X_{n2} & X_{n3} & X_{n4}\\\\\n",
"\\end{bmatrix} \\begin{bmatrix}\n",
"b_{1}^1 & b_{2}^1 & b_{3}^1\\\\\n",
"W_{11}^1 & W_{12}^1 & W_{13}^1\\\\\n",
"W_{21}^1 & W_{22}^1 & W_{23}^1\\\\\n",
"W_{31}^1 & W_{32}^1 & W_{33}^1\\\\\n",
"W_{41}^1 & W_{42}^1 & W_{43}^1\\\\\n",
"\\end{bmatrix} =\n",
"\\begin{bmatrix}\n",
"z_{11}^1 & z_{12}^1 & z_{13}^1\\\\\n",
"z_{21}^1 & z_{22}^1 & z_{23}^1\\\\\n",
"\\vdots & \\vdots & \\vdots \\\\\n",
"z_{n1}^1 & z_{n2}^1 & z_{n3}^1\\\\\n",
"\\end{bmatrix}\n",
"= \\textbf{Z}^{(1)}\n",
"$$ \n",
"About the notation: superscript refers to the layer and subscript refers to the index in the particular matrix. So $W_{23}^1$ is the weight in the 1st layer connecting the 2nd input to 3rd hidden node. Compare this matrix representation to the slide image. Also note the bias terms and ones that have been added to 'augment' certain matrices. You could consider $b_1^1$ to be $W_{01}^1$.