{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Title\n",
    "\n",
    "**Exercise: 1 - Bias Variance Tradeoff**\n",
    "\n",
    "# Description\n",
    "\n",
    "The aim of this exercise is to understand **bias variance tradeoff**. For this, you will fit different degree polynomial regression on the same data and plot them as given below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../img/image.png\" style=\"width: 500px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Instructions:\n",
    "- Read the file noisypopulation.csv as a pandas dataframe.\n",
    "- Assign the response and predictor variables appropriately.\n",
    "- Perform bootstrap operation on the dataset. \n",
    "- For each bootstrap:\n",
    "    - For degree of the chosen degree value:\n",
    "        - Compute the polynomial features\n",
    "        - Fit the model on the given data\n",
    "        - Select a set of random points in the data to predict the model\n",
    "        - Store the predicted values as a list\n",
    "- Plot the predicted values along with the random data points and true function as given above.\n",
    "\n",
    "\n",
    "# Hints:\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html\" target=\"_blank\">sklearn.PolynomialFeatures()</a> : Generates polynomial and interaction features\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html\" target=\"_blank\">sklearn.LinearRegression()</a> : LinearRegression fits a linear model\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit\" target=\"_blank\">sklearn.fit()</a> : Fits the linear model to the training data\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict\" target=\"_blank\">sklearn.predict()</a> : Predict using the linear model.\n",
    "\n",
    "Note: This exercise is **auto-graded and you can try multiple attempts.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "#Importing libraries\n",
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "import matplotlib as mpl\n",
    "import matplotlib.cm as cm\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.preprocessing import PolynomialFeatures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "#Helper function to set plot characteristics\n",
    "\n",
    "def make_plot():\n",
    "    fig, axes=plt.subplots(figsize=(20,8), nrows=1, ncols=2);\n",
    "    axes[0].set_ylabel(\"$p_R$\", fontsize=18)\n",
    "    axes[0].set_xlabel(\"$x$\", fontsize=18)\n",
    "    axes[1].set_xlabel(\"$x$\", fontsize=18)\n",
    "    axes[1].set_yticklabels([])\n",
    "    axes[0].set_ylim([0,1])\n",
    "    axes[1].set_ylim([0,1])\n",
    "    axes[0].set_xlim([0,1])\n",
    "    axes[1].set_xlim([0,1])\n",
    "    plt.tight_layout();\n",
    "    return axes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>f</th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.047790</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.011307</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.051199</td>\n",
       "      <td>0.01</td>\n",
       "      <td>0.010000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.054799</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.007237</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.058596</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.000056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.062597</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.010000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          f     x         y\n",
       "0  0.047790  0.00  0.011307\n",
       "1  0.051199  0.01  0.010000\n",
       "2  0.054799  0.02  0.007237\n",
       "3  0.058596  0.03  0.000056\n",
       "4  0.062597  0.04  0.010000"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reading the file into a dataframe\n",
    "\n",
    "df = pd.read_csv(\"noisypopulation.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "### edTest(test_data) ###\n",
    "\n",
    "# Set the predictor and response variable\n",
    "# Column x is the predictor and column y is the response variable.\n",
    "# Column f is the true function of the given data\n",
    "# Select the values of the columns\n",
    "\n",
    "x = df.___\n",
    "y = df.___\n",
    "f = df.___"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "### edTest(test_poly) ###\n",
    "# Function to compute the Polynomial Features for the data x for the given degree d\n",
    "def polyshape(d, x):\n",
    "    return PolynomialFeatures(___).fit_transform(___.reshape(-1,1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "### edTest(test_linear) ###\n",
    "#Function to fit a Linear Regression model \n",
    "def make_predict_with_model(x, y, x_pred):\n",
    "    \n",
    "    #Create a Linear Regression model with fit_intercept as False\n",
    "    lreg = ___\n",
    "    \n",
    "    #Fit the model to the data x and y\n",
    "    lreg.fit(___, ___)\n",
    "    \n",
    "    #Predict on the x_pred data\n",
    "    y_pred = lreg.predict(___)\n",
    "    return lreg, y_pred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "#Function to perform bootstrap and fit the data\n",
    "\n",
    "# degree is the maximum degree of the model\n",
    "# num_boot is the number of bootstraps\n",
    "# size is the number of random points selected from the data for each bootstrap\n",
    "# x is the predictor variable\n",
    "# y is the response variable\n",
    "\n",
    "def gen(degree, num_boot, size, x, y):\n",
    "    \n",
    "    #Create 2 lists to store the prediction and model\n",
    "    predicted_values, linear_models =[], []\n",
    "    \n",
    "    #Run the loop for the number of bootstrap value\n",
    "    for i in range(num_boot):\n",
    "        \n",
    "        #Helper code to call the make_predict_with_model function to fit the data\n",
    "        indexes=np.sort(np.random.choice(x.shape[0], size=size, replace=False))\n",
    "        \n",
    "        #lreg and y_pred hold the model and predicted values of the current bootstrap\n",
    "        lreg, y_pred = make_predict_with_model(polyshape(degree, x[indexes]), y[indexes], polyshape(degree, x))\n",
    "        \n",
    "        #Append the model and predicted values into the appropriate lists\n",
    "        predicted_values.append(___)\n",
    "        linear_models.append(___)\n",
    "    \n",
    "    #Return the 2 lists\n",
    "    return predicted_values, linear_models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_gen) ###\n",
    "# Call the function gen twice with x and y as the predictor and response variable respectively\n",
    "# The number of bootstraps should be 200 and the number of samples from the dataset should be 30\n",
    "# Store the return values in appropriate variables\n",
    "\n",
    "# Get results for degree 1\n",
    "predicted_1, model_1 = gen(___);\n",
    "\n",
    "# Get results for degree 100\n",
    "predicted_100, model_100 = gen(___);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Helper code to plot the data\n",
    "\n",
    "indexes=np.sort(np.random.choice(x.shape[0], size=30, replace=False))\n",
    "\n",
    "plt.figure(figsize = (12,8))\n",
    "axes=make_plot()\n",
    "\n",
    "#Plot for Degree 1\n",
    "axes[0].plot(x,f,label=\"f\", color='darkblue',linewidth=4)\n",
    "axes[0].plot(x, y, '.', label=\"Population y\", color='#009193',markersize=8)\n",
    "axes[0].plot(x[indexes], y[indexes], 's', color='black', label=\"Data y\")\n",
    "\n",
    "for i,p in enumerate(predicted_1[:-1]):\n",
    "    axes[0].plot(x,p,alpha=0.03,color='#FF9300')\n",
    "axes[0].plot(x, predicted_1[-1], alpha=0.3,color='#FF9300',label=\"Degree 1 from different samples\")\n",
    "\n",
    "\n",
    "#Plot for Degree 100\n",
    "axes[1].plot(x,f,label=\"f\", color='darkblue',linewidth=4)\n",
    "axes[1].plot(x, y, '.', label=\"Population y\", color='#009193',markersize=8)\n",
    "axes[1].plot(x[indexes], y[indexes], 's', color='black', label=\"Data y\")\n",
    "\n",
    "\n",
    "for i,p in enumerate(predicted_100[:-1]):\n",
    "    axes[1].plot(x,p,alpha=0.03,color='#FF9300')\n",
    "axes[1].plot(x,predicted_100[-1],alpha=0.2,color='#FF9300',label=\"Degree 100 from different samples\")\n",
    "\n",
    "axes[0].legend(loc='best')\n",
    "axes[1].legend(loc='best')\n",
    "\n",
    "#edgecolor='black', linewidth=3, facecolor='green',\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### After you mark the exercise, run the code again, but this time with degree 10 instead of 100. Do you see a decrease in variance? Why are the edges still so erractic?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your answer here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also check the values of the coefficients for each of your runs. Do you see a pattern? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your answer here "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}