{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Title\n", "\n", "**Exercise A.1 - Constructing an MLP**\n", "\n", "# Description\n", "\n", "From the exercise A.1, we observe that a single neuron is rather limited in what it can accomplish. But what if we expand the number of neurons in our network to make it more expressive? \n", "\n", "The aim of this exercise is to **construct an MLP** learn its parameters from data using the **Keras** API which is part of Tensorflow 2.x.\n", "\n", "TensorFlow is a framework for representing complicated DNN algorithms and executing them in any platform, from a phone to a distributed system using GPUs. Developed by Google Brain, TensorFlow is used very widely today.\n", "\n", "Keras is a high-level API used for fast prototyping, advanced research, and production. We will use tf.keras which is TensorFlow's implementation of the Keras API.\n", "\n", "You can learn more about Keras here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "[Note: In this image $W$ matrices include both weight and bias terms and the vector input $x$ has been augmented into the matrix $X$ by adding a column of ones]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- In our example, we have a single input, $x$ \n", "- The first layer consists of 2 nodes (or neurons), each with its own weight and bias used to perform an affine transformation on the nodes' respective inputs. We refer to this as the 'hidden' layer. \n", "- Both nodes in the hidden layer use the same activation function of the form $\\sigma\\left(\\cdot\\right)$ on their affine transformations.\n", "- The outputs of the hidden layer nodes must then be combined to give the overall output of the network. This is the output layer. Because we will interpret the output as a probability we just take a linear combination of the hidden layer and pass it through another sigmoid activation to produce the actual prediction, $y$. Notice that the output layer has its own weights and bias.\n", "\n", "This multilayer perceptron is much more expressive than a single perceptron, but setting all the new parameters manually would be quite tedious. And in larger networks, it's completely infeasible!\n", "\n", "# Instructions:\n", "\n", "- Read the heart dataset as a pandas dataframe\n", "- Assign the predictor and response variable and plot the data.\n", "- Instantiate a Keras model\n", "- Add a hidden layer with 2 nodes and a sigmoid activation function\n", "- Add an output layer by choosing the number of nodes and the activation function\n", "- Compile the model with binary cross-entropy as the loss function\n", "- Fit the data on the model by specifying the number of epochs.\n", "- Plot the training history\n", "- Predict using the model and compute the accuracy\n", "\n", "# Hints:\n", "\n", "keras.add() : To add a layer to the model\n", "\n", "keras.fit() : Fit the model for the data\n", "\n", "Note: This exercise is **auto-graded and you can try multiple attempts.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Import libraries\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "from tensorflow.keras import layers\n", "from tensorflow.keras import models\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We set a random seed to ensure our results are reproducible \n", "seed = 399\n", "np.random.seed(seed)\n", "tf.random.set_seed(seed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multilayer Perceptron\n", "We're using a slightly altered version of heart data from the previous exercise to illustrate a concept. Notice how now a single sigmoid will not be able to do an acceptible job of classification." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Read the data and assign predictor and reponse variables\n", "altered_heart_data = pd.read_csv('data/altered_heart_data.csv')\n", "x = altered_heart_data.MaxHR.values\n", "y = altered_heart_data.AHD.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Plot the data \n", "plt.scatter(x, y, alpha=0.2)\n", "plt.xlabel('MaxHR')\n", "plt.ylabel('Heart Disease (AHD)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will use Keras to construct our MLP. You only need to add the output layer.\n", "Each step is described in the comments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## First we instantiate a Keras model\n", "MLP = models.Sequential(name='MLP')\n", "\n", "## [Adding Layers]\n", "# Next we add a hidden layer with 2 nodes and a sigmoid activation function\n", "MLP.add(layers.Dense(2, activation='sigmoid', input_shape=(1,), name='hidden'))\n", "\n", "# Now add the output layer\n", "# Use the code above as an example. You only need to change the arguments\n", "# Choose number of nodes and activation and name it 'output'\n", "# (the input shape will be infered from the hidden layer's output shape)\n", "# your code here\n", "________\n", "\n", "# [Compilation]\n", "# here we set the loss to be minimized and a metric to monitor when fitting the MLP\n", "MLP.compile(loss='binary_crossentropy', metrics=['accuracy'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This simple model benefits from setting some reasonable initial weights.\n", "During fitting these weights will be optimized." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get original random weights\n", "weights = MLP.get_weights()\n", "\n", "# Hidden layer\n", "# Weights \n", "weights[0][0]=np.array([ 3.1,-3.2])\n", "\n", "# Biases\n", "weights[1]=np.array([-350.,402.])\n", "\n", "\n", "# Output layer \n", "# Weights\n", "weights[2]=np.array([[1.29],[1.11]])\n", "\n", "# Biases\n", "weights[3] = np.array([-1.11]) \n", "\n", "# Update weights\n", "MLP.set_weights(weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should always inspect your Keras model with the `summary()` method. Note the number of parameters in each layer:\n", "\n", "Hidden has 4 - it contains 2 nodes each with a weight and bias.\n", "\n", "Output has 3 - it has a weight for each node in the previous layer plus a bias term." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#View model summary\n", "MLP.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we fit (or 'train') the MLP on our data, updating the weights to minimize the loss we specified in the call to `compile()`. (There will be more details on how this update happens in future lectures).\n", "\n", "One full training cycle on our data is called an 'epoch.' Usually multiple epochs are required before a model converges. Specify a number of epochs to train for." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#Fit the model on the data by specifing number of epochs\n", "MLP.fit(___);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can plot the training history and observe that as the weights were updated our loss declined and the accuracy improved." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Plotting the model history\n", "history = MLP.history.history\n", "fig, ax = plt.subplots(1,2, figsize=(10,4))\n", "ax[0].plot(history['loss'], c='r', label='loss')\n", "ax[0].set_ylabel('crossentropy loss')\n", "ax[1].plot(history['accuracy'], c='b')\n", "ax[1].set_ylabel('accuracy')\n", "for axis in ax:\n", " axis.set_xlabel('epoch')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the individual outputs of the 2 nodes in the hidden layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create xs for input to predict on\n", "x_linspace = np.linspace(np.min(x), np.max(x), 500) \n", "\n", "# Get output from the hidden layer nodes\n", "hidden = models.Model(inputs=MLP.input, outputs=MLP.get_layer('hidden').output)\n", "hidden_pred = hidden.predict(x_linspace)\n", "\n", "h1_pred = hidden_pred[:,0]\n", "h2_pred = hidden_pred[:,1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Plot output from h1 & h2\n", "fig, ax = plt.subplots(1,1, figsize=(14,10))\n", "ax.scatter(x, y, alpha=0.4, label='Altered Heart Data')\n", "ax.plot(x_linspace, h1_pred, lw=4, alpha=0.6, label=r'$h_1 = \\sigma(W_1x+\\beta_1)$')\n", "ax.plot(x_linspace, h2_pred, lw=4, alpha=0.6, label=r'$h_2 = \\sigma(W_2x+\\beta_2)$')\n", "\n", "# Set title\n", "ax.set_title('Hidden Layer Nodes', fontsize=24)\n", "\n", "# Create labels (very important!)\n", "ax.set_xlabel('$x$', fontsize=24) # Notice we make the labels big enough to read\n", "ax.set_ylabel('$y$', fontsize=24)\n", "\n", "ax.tick_params(labelsize=24) # Make the tick labels big enough to read\n", "\n", "ax.legend(fontsize=24, loc='best'); # Create a legend and make it big enough to read" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that each node in the hidden layer indeed outputs a different sigmoid.\n", "\n", "Now let's look at how they are combined by the output layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get output layer predictions\n", "y_pred = MLP.predict(x_linspace)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot predictions\n", "fig, ax = plt.subplots(1,1, figsize=(14,10))\n", "\n", "ax.scatter(x, y, alpha=0.4, label=r'Altered Heart Data')\n", "ax.plot(x_linspace, y_pred, lw=4, label=r'$y_{pred} = \\sigma(W_3h_{1} + W_4h_{2}+\\beta_{3})$')\n", "\n", "# Create labels (very important!)\n", "ax.set_xlabel('$x$', fontsize=24) # Notice we make the labels big enough to read\n", "ax.set_ylabel('$y$', fontsize=24)\n", "\n", "ax.tick_params(labelsize=24) # Make the tick labels big enough to read\n", "\n", "ax.legend(fontsize=24, loc='best'); # Create a legend and make it big enough to read" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's compare the MLP's accuracy to a baseline that always predicts the majority class. Try and see if you can get over 80% accuracy (86%+ is possible). You may need to change the number of epochs above and rerun the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def accuracy(y_true, y_pred):\n", " assert y_true.shape[0] == y_pred.shape[0]\n", " return sum(y_true == (y_pred >= 0.5).astype(int))/len(y_true)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### edTest(test_performance) ###\n", "final_pred = MLP.predict(x).flatten()\n", "\n", "baseline_acc = accuracy(y, np.zeros(len(y))) # predictions are all zeros\n", "MLP_acc = accuracy(y, final_pred)\n", "print(f'Baseline Accuracy: {baseline_acc:.2%}')\n", "print(f'MLP Accuracy: {MLP_acc:.2%}')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }