{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Title\n",
    "\n",
    "**Exercise: B.1 - MLP by Hand**\n",
    "\n",
    "# Description\n",
    "\n",
    "In this exercise, we will **construct a neural network** to classify 3 species of iris. The classification is based on 4 measurement predictor variables: sepal length & width, and petal length & width in the given dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../img/image5.jpeg\" style=\"width: 500px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Instructions:\n",
    "The Neural Network will be built from scratch using pre-trained weights and biases. Hence, we will only be doing the forward (i.e., prediction) pass. \n",
    "\n",
    "- Load the iris dataset from sklearn standard datasets.\n",
    "- Assign the predictor and response variables appropriately.\n",
    "- One hot encode the categorical labels of the predictor variable.\n",
    "- Load and inspect the pre-trained weights and biases.\n",
    "- Construct the MLP:\n",
    "    - Augment X with a column of ones to create the augmented design matrix X \n",
    "    - Create the first layer weight matrix by vertically stacking the bias vector on top of the weight vector\n",
    "    - Perform the affine transformation \n",
    "    - Activate the output of the affine transformation using ReLU \n",
    "    - Repeat the first 3 steps for the hidden layer (augment, vertical stack, affine)\n",
    "    - Use softmax on the final layer\n",
    "    - Finally, predict y \n",
    "    \n",
    "# Hints:\n",
    "This will further develop our intuition for the architecture of a deep neural network. This diagram shows the structure of our network. You may find it useful to refer to it during the exercise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../img/image6.png\" style=\"width: 500px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is our first encounter with a multi-class classification problem and also the softmax activation on the output layer. Note: $f_1()$ above is the ReLU activation and $f_2()$ is the softmax.\n",
    "\n",
    "<a href=\"https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical\" target=\"_blank\">to_categorical(y, num_classes=None, dtype='float32')</a> : Converts a class vector (integers) to the binary class matrix.\n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.vstack.html\" target=\"_blank\">np.vstack(tup)</a> : Stack arrays in sequence vertically (row-wise).\n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.dot.html\" target=\"_blank\">numpy.dot(a, b, out=None)</a> : Returns the dot product of two arrays.\n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.argmax.html\" target=\"_blank\">numpy.argmax(a, axis=None, out=None)</a> : Returns the indices of the maximum values along an axis.\n",
    "\n",
    "Note: This exercise is **auto-graded and you can try multiple attempts.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Import library\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import tensorflow as tf\n",
    "from sklearn.datasets import load_iris\n",
    "from tensorflow.keras.utils import to_categorical\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X shape: (150, 4)\n",
      "y shape: (150,)\n"
     ]
    }
   ],
   "source": [
    "#Load the iris data\n",
    "iris_data = load_iris()\n",
    "\n",
    "#Get the predictor and reponse variables\n",
    "X = iris_data.data\n",
    "y = iris_data.target\n",
    "\n",
    "#See the shape of the data\n",
    "print(f'X shape: {X.shape}')\n",
    "print(f'y shape: {y.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Y shape: (150, 3)\n"
     ]
    }
   ],
   "source": [
    "#One-hot encode target labels\n",
    "Y = to_categorical(y)\n",
    "print(f'Y shape: {Y.shape}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load and inspect the pre-trained weights and biases. Compare their shapes to the NN diagram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Load and inspect the pre-trained weights and biases\n",
    "weights = np.load('data/weights.npy', allow_pickle=True)\n",
    "\n",
    "# weights for hidden (1st) layer\n",
    "w1 = weights[0] \n",
    "\n",
    "# biases for hidden (1st) layer\n",
    "b1 = weights[1]\n",
    "\n",
    "# weights for output (2nd) layer\n",
    "w2 = weights[2]\n",
    "\n",
    "#biases for output (2nd) layer\n",
    "b2 = weights[3] "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "w1 - shape: (4, 3)\n",
      "[[-0.42714605 -0.72814226  0.37730372]\n",
      " [ 0.39002347 -0.73936987  0.7850246 ]\n",
      " [ 0.12336338 -0.7267647  -0.48210236]\n",
      " [ 0.20957732 -0.7505736  -1.3789996 ]]\n",
      "\n",
      "b1 - shape: (3,)\n",
      "[0.         0.         0.31270522]\n",
      "\n",
      "w2 - shape: (3, 3)\n",
      "[[ 0.7043929   0.13273811 -0.845736  ]\n",
      " [-0.8318007  -0.6977086   0.75894   ]\n",
      " [ 1.1978723   0.14868832 -0.473792  ]]\n",
      "\n",
      "b2 - shape: (3,)\n",
      "[-1.2774311   0.45491916  0.73040146]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#Compare their shapes to that in the NN diagram.\n",
    "for arr, name in zip([w1,b1,w2,b2], ['w1','b1','w2','b2']):\n",
    "    print(f'{name} - shape: {arr.shape}')\n",
    "    print(arr)\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the first affine transformation we need to multiple the augmented input by the first weight matrix (i.e., layer).\n",
    "\n",
    "$$\n",
    "\\begin{bmatrix}\n",
    "1 & X_{11} & X_{12} & X_{13} & X_{14}\\\\\n",
    "1 & X_{21} & X_{22} & X_{23} & X_{24}\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
    "1 & X_{n1} & X_{n2} & X_{n3} & X_{n4}\\\\\n",
    "\\end{bmatrix} \\begin{bmatrix}\n",
    "b_{1}^1 & b_{2}^1 & b_{3}^1\\\\\n",
    "W_{11}^1 & W_{12}^1 & W_{13}^1\\\\\n",
    "W_{21}^1 & W_{22}^1 & W_{23}^1\\\\\n",
    "W_{31}^1 & W_{32}^1 & W_{33}^1\\\\\n",
    "W_{41}^1 & W_{42}^1 & W_{43}^1\\\\\n",
    "\\end{bmatrix} =\n",
    "\\begin{bmatrix}\n",
    "z_{11}^1 & z_{12}^1 & z_{13}^1\\\\\n",
    "z_{21}^1 & z_{22}^1 & z_{23}^1\\\\\n",
    "\\vdots & \\vdots & \\vdots \\\\\n",
    "z_{n1}^1 & z_{n2}^1 & z_{n3}^1\\\\\n",
    "\\end{bmatrix}\n",
    "= \\textbf{Z}^{(1)}\n",
    "$$ \n",
    "<span style='color:gray'>About the notation: superscript refers to the layer and subscript refers to the index in the particular matrix. So $W_{23}^1$ is the weight in the 1st layer connecting the 2nd input to 3rd hidden node. Compare this matrix representation to the slide image. Also note the bias terms and ones that have been added to 'augment' certain matrices. You could consider $b_1^1$ to be $W_{01}^1$.</span><div></div>\n",
    "<span style='color:blue'>1. Augment X with a column of ones to create `X_aug`</span><div></div><span style='color:blue'>2. Create the first layer weight matrix `W1` by vertically stacking the bias vector `b1`on top of `w1` (consult `add_ones_col` for ideas. Don't forget your `Tab` and `Shift+Tab` tricks!)</span><div></div><span style='color:blue'>3. Do the matrix multiplication to find `Z1`</span>\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_ones_col(X):\n",
    "    '''Augment matrix with a column of ones'''\n",
    "    X_aug = np.hstack((np.ones((X.shape[0],1)), X))\n",
    "    return X_aug"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Use add_ones_col()\n",
    "X_aug = add_ones_col(___)\n",
    "\n",
    "#Use np.vstack to add biases to weight matrix\n",
    "W1 = np.vstack((___,___))\n",
    "\n",
    "#Use np.dot() to multiple X_aug and W1\n",
    "Z1 = np.dot(___,___) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we use our non-linearity\n",
    "$$\n",
    "\\textit{a}_{\\text{relu}}(\\textbf{Z}^{(1)})=\n",
    "\\begin{bmatrix}\n",
    "h_{11} & h_{12} & h_{13}\\\\\n",
    "h_{21} & h_{22} & h_{23}\\\\\n",
    "\\vdots & \\vdots & \\vdots \\\\\n",
    "h_{n1} & h_{n2} & h_{n3}\\\\\n",
    "\\end{bmatrix}= \\textbf{H}\n",
    "$$\n",
    "\n",
    "\n",
    "\n",
    "<span style='color:blue'>1. Define the ReLU activation</span><div></div>\n",
    "<span style='color:blue'>2. use `plot_activation_func` to confirm implementation</span><div></div>\n",
    "<span style='color:blue'>3. Use relu on `Z1` to create `H`</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def relu(z: np.array) -> np.array:\n",
    "    # hint: \n",
    "    #       relu(z) = 0 when z < 0\n",
    "    #       otherwise relu(z) = z\n",
    "    # your code here\n",
    "    h = np.maximum(___,___) # np.maximum() will help\n",
    "    return h"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Helper code to plot the activation function\n",
    "def plot_activation_func(f, name):\n",
    "    lin_x = np.linspace(-10,10,200)\n",
    "    h = f(lin_x)\n",
    "    plt.plot(lin_x, h)\n",
    "    plt.xlabel('x')\n",
    "    plt.ylabel('y')\n",
    "    plt.title(f'{name} Activation Function')\n",
    "\n",
    "plot_activation_func(relu, name='RELU')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use your relu activation function on Z1\n",
    "H = relu(___) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is very similar to the first and so we've filled it in for you.\n",
    "\n",
    "$$\n",
    "\\begin{bmatrix}\n",
    "1 & h_{11} & h_{12} & h_{13}\\\\\n",
    "1 & h_{21} & h_{22} & h_{23}\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
    "1 & h_{n1} & h_{n2} & h_{n3}\\\\\n",
    "\\end{bmatrix}\n",
    "\\begin{bmatrix}\n",
    "b_{1}^{(2)} & b_{2}^2 & b_{3}^2\\\\\n",
    "W_{11}^2 & W_{12}^2 & W_{13}^2\\\\\n",
    "W_{21}^2 & W_{22}^2 & W_{23}^2\\\\\n",
    "W_{31}^2 & W_{32}^2 & W_{33}^2\\\\\n",
    "\\end{bmatrix}=\n",
    "\\begin{bmatrix}\n",
    "z_{11}^2 & z_{12}^2 & z_{13}^2\\\\\n",
    "z_{21}^2 & z_{22}^2 & z_{23}^2\\\\\n",
    "\\vdots & \\vdots & \\vdots \\\\\n",
    "z_{n1}^2 & z_{n2}^2 & z_{n3}^2\\\\\n",
    "\\end{bmatrix} = \\textbf{Z}^{(2)}\n",
    "$$\n",
    "\n",
    "\n",
    "<span style='color:blue'>1. Augment `H` with ones to create `H_aug`</span><div></div>\n",
    "<span style='color:blue'>2. Combine `w2` and `b2` to create the output weight matric `W2`</span><div></div>\n",
    "<span style='color:blue'>3. Perform the matrix multiplication to produce `Z2`</span><div></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Use add_ones_col()\n",
    "H_aug = ___\n",
    "\n",
    "#Use np.vstack to add biases to weight matrix\n",
    "W2 = ___\n",
    "\n",
    "#Use np.dot()\n",
    "Z2 = np.dot(H_aug,W2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we use the softmax activation on `Z2`. Now for each observation we have an output vector of length 3 which can be interpreted as a probability (they sum to 1).\n",
    "$$\n",
    "\\textit{a}_{\\text{softmax}}(\\textbf{Z}^2)=\n",
    "\\begin{bmatrix}\n",
    "\\hat{y}_{11} & \\hat{y}_{12} & \\hat{y}_{13}\\\\\n",
    "\\hat{y}_{21} & \\hat{y}_{22} & \\hat{y}_{23}\\\\\n",
    "\\vdots & \\vdots & \\vdots \\\\\n",
    "\\hat{y}_{n1} & \\hat{y}_{n2} & \\hat{y}_{n3}\\\\\n",
    "\\end{bmatrix}\n",
    "= \\hat{\\textbf{Y}}\n",
    "$$\n",
    "\n",
    "<span style='color:blue'>1. Define softmax</span><div></div>\n",
    "<span style='color:blue'>2. Use `softmax` on `Z2` to create `Y_hat`</span><div></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def softmax(z: np.array) -> np.array:\n",
    "    '''\n",
    "    Input: z - 2D numpy array of logits \n",
    "           rows are observations, classes are columns \n",
    "    Returns: y_hat - 2D numpy array of probabilities\n",
    "             rows are observations, classes are columns \n",
    "    '''\n",
    "    # hint: we are summing across the columns\n",
    "\n",
    "    y_hat = np.exp(___)/np.sum(np.exp(___), axis=___, keepdims=True)\n",
    "    return y_hat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Calling the softmax function\n",
    "Y_hat = softmax(___)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style='color:blue'>Now let's see how accuract the model's predictions are! Use `np.argmax` to collapse the columns of `Y_hat` to create `y_hat`, a vector of class labels like the original `y` before one-hot encoding.</span><div></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_acc) ###\n",
    "\n",
    "# Compute the accuracy\n",
    "y_hat = np.argmax(___, axis=___)\n",
    "acc = sum(y == y_hat)/len(y)\n",
    "print(f'accuracy: {acc:.2%}')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}