{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: 1 - Bias Variance Tradeoff**\n",
"\n",
"# Description\n",
"\n",
"The aim of this exercise is to understand **bias variance tradeoff**. For this, you will fit different degree polynomial regression on the same data and plot them as given below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions:\n",
"- Read the file noisypopulation.csv as a pandas dataframe.\n",
"- Assign the response and predictor variables appropriately.\n",
"- Perform bootstrap operation on the dataset. \n",
"- For each bootstrap:\n",
" - For degree of the chosen degree value:\n",
" - Compute the polynomial features\n",
" - Fit the model on the given data\n",
" - Select a set of random points in the data to predict the model\n",
" - Store the predicted values as a list\n",
"- Plot the predicted values along with the random data points and true function as given above.\n",
"\n",
"\n",
"# Hints:\n",
"\n",
"sklearn.PolynomialFeatures() : Generates polynomial and interaction features\n",
"\n",
"sklearn.LinearRegression() : LinearRegression fits a linear model\n",
"\n",
"sklearn.fit() : Fits the linear model to the training data\n",
"\n",
"sklearn.predict() : Predict using the linear model.\n",
"\n",
"Note: This exercise is **auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"#Importing libraries\n",
"%matplotlib inline\n",
"import numpy as np\n",
"import scipy as sp\n",
"import matplotlib as mpl\n",
"import matplotlib.cm as cm\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.preprocessing import PolynomialFeatures"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"#Helper function to set plot characteristics\n",
"\n",
"def make_plot():\n",
" fig, axes=plt.subplots(figsize=(20,8), nrows=1, ncols=2);\n",
" axes[0].set_ylabel(\"$p_R$\", fontsize=18)\n",
" axes[0].set_xlabel(\"$x$\", fontsize=18)\n",
" axes[1].set_xlabel(\"$x$\", fontsize=18)\n",
" axes[1].set_yticklabels([])\n",
" axes[0].set_ylim([0,1])\n",
" axes[1].set_ylim([0,1])\n",
" axes[0].set_xlim([0,1])\n",
" axes[1].set_xlim([0,1])\n",
" plt.tight_layout();\n",
" return axes"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | f | \n", "x | \n", "y | \n", "
---|---|---|---|
0 | \n", "0.047790 | \n", "0.00 | \n", "0.011307 | \n", "
1 | \n", "0.051199 | \n", "0.01 | \n", "0.010000 | \n", "
2 | \n", "0.054799 | \n", "0.02 | \n", "0.007237 | \n", "
3 | \n", "0.058596 | \n", "0.03 | \n", "0.000056 | \n", "
4 | \n", "0.062597 | \n", "0.04 | \n", "0.010000 | \n", "