{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Title\n",
    "\n",
    "**Exercise: 2 - Simple Lasso and Ridge Regularization**\n",
    "\n",
    "# Description\n",
    "\n",
    "The aim of this exercise is to understand **Lasso and Ridge regularization**.\n",
    "\n",
    "Plot Predictor vs Coefficient as a horizontal bar chart. The graph may look similar to the one given below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../img/image2.png\" style=\"width: 500px;\">\n",
    "<img src=\"../img/image3.png\" style=\"width: 500px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Instructions:\n",
    "- Read the dataset and assign the predictor and response variables.\n",
    "- Split the dataset into train and validation sets\n",
    "- Fit a multi-linear regression model\n",
    "- Compute the validation MSE of the model\n",
    "- Compute the coefficient of the predictors and store to the plot later\n",
    "- Implement Lasso regularization by specifying an alpha value. Repeat steps 4 and 5\n",
    "- Implement Ridge regularization by specifying the same alpha value. Repeat steps 4 and 5\n",
    "- Plot the coefficient of all the 3 models in one graph as shown above\n",
    "\n",
    "# Hints:\n",
    "<a href=\"https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.transpose.html\" target=\"_blank\">np.transpose()</a> : Reverse or permute the axes of an array; returns the modified array\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html\" target=\"_blank\">sklearn.normalize()</a> : Scales input vectors individually to the unit norm (vector length)\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\" target=\"_blank\">sklearn.train_test_split()</a> : Splits the data into random train and test subsets\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html\" target=\"_blank\">sklearn.PolynomialFeatures()</a> : Generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html\" target=\"_blank\">sklearn.fit_transform()</a> : Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html\" target=\"_blank\">sklearn.LinearRegression()</a> : LinearRegression fits a linear model\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit\" target=\"_blank\">sklearn.fit()</a> : Fits the linear model to the training data\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict\" target=\"_blank\">sklearn.predict()</a> : Predict using the linear modReturns the coefficient of the predictors in the model.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html\" target=\"_blank\">mean_squared_error()</a> : Mean squared error regression loss\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html\" target=\"_blank\">sklearn.coef_</a> : Returns the coefficients of the predictors\n",
    "\n",
    "<a href=\"https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.subplots.html\" target=\"_blank\">plt.subplots()</a> : Create a figure and a set of subplots\n",
    "\n",
    "<a href=\"https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.axes.Axes.barh.html\" target=\"_blank\">ax.barh()</a> : Make a horizontal bar plot\n",
    "\n",
    "<a href=\"https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.set_xlim.html\" target=\"_blank\">ax.set_xlim()</a> : Sets the x-axis view limits\n",
    "\n",
    "<a href=\"http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html\" target=\"_blank\">sklearn.Lasso()</a> : Linear Model trained with L1 prior as a regularizer\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html\" target=\"_blank\">sklearn.Ridge()</a> : Linear least squares with L2 regularization\n",
    "\n",
    "<a href=\"https://docs.python.org/3.3/library/functions.html#zip\" target=\"_blank\">zip()</a> : Makes an iterator that aggregates elements from each of the iterables.\n",
    "\n",
    "**Note: This exercise is auto-graded and you can try multiple attempts.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn import preprocessing\n",
    "from sklearn.linear_model import Lasso\n",
    "from sklearn.linear_model import Ridge\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.preprocessing import PolynomialFeatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Read the file \"Boston_housing.csv\" as a dataframe\n",
    "\n",
    "df = pd.read_csv(\"Boston_housing.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predictors & Response variables\n",
    "\n",
    "#### Select the 'medv' column as response variable and the rest of the columns as predictors.\n",
    "\n",
    "As such, all the following columns are predictors:\n",
    "- crim\n",
    "- indus\n",
    "- nox\n",
    "- rm\n",
    "- age\n",
    "- dis\n",
    "- rad\n",
    "- tax\n",
    "- ptratio\n",
    "- black\n",
    "- lstat\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select a subdataframe of predictors mentioned above\n",
    "\n",
    "X = df[___]\n",
    "\n",
    "# Normalize the values of the dataframe \n",
    "\n",
    "X_norm = preprocessing.normalize(___)\n",
    "\n",
    "# Select medv as the response variable\n",
    "\n",
    "y = df[___]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Split the dataset into train and validation sets\n",
    "\n",
    "Keep the test size as 30% of the dataset, and use ```random_state```=31"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_random) ###\n",
    "# Split the data into train and validation sets\n",
    "\n",
    "X_train, X_val, y_train, y_val = train_test_split(___)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multi-linear Regression Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "#Fit a linear regression model on the training data\n",
    "\n",
    "lreg = LinearRegression()\n",
    "lreg.fit(___)\n",
    "\n",
    "# Predict on the validation set\n",
    "\n",
    "y_val_pred = lreg.predict(___)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing the MSE for Multi-Linear Regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Use the mean_squared_error function to compute the validation mse\n",
    "\n",
    "mse = mean_squared_error(___,___)\n",
    "\n",
    "# print the MSE value\n",
    "\n",
    "print (\"Multi-linear regression validation MSE is\", mse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtaining the coefficients of the predictors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [],
   "source": [
    "#make a dictionary of the coefficients along with the predictors as keys\n",
    "\n",
    "lreg_coef = dict(zip(X.columns, np.transpose(lreg.coef_)))\n",
    "\n",
    "#Linear regression coefficient values to plot\n",
    "\n",
    "lreg_x = list(lreg_coef.keys())\n",
    "lreg_y = list(lreg_coef.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing Lasso regularization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Now, you will implement the lasso regularisation\n",
    "\n",
    "# Use alpha = 0.001\n",
    "\n",
    "lasso_reg = Lasso(___)\n",
    "\n",
    "#Fit on training data\n",
    "\n",
    "lasso_reg.fit(___)\n",
    "\n",
    "#Make a prediction on the validation data using the above trained model\n",
    "\n",
    "y_val_pred =lasso_reg.predict(___)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing the MSE with Lasso regularization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Again, calculate the validation MSE & print it\n",
    "\n",
    "mse_lasso = mean_squared_error(___,___)\n",
    "\n",
    "print (\"Lasso validation MSE is\", mse_lasso)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtaining the coefficients of the predictors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Use the helper code below to make a dictionary of the predictors along with the coefficients associated with them\n",
    "\n",
    "lasso_coef = dict(zip(X.columns, np.transpose(lasso_reg.coef_))) \n",
    "\n",
    "#Lasso regularisation coefficient values to plot\n",
    "\n",
    "lasso_x = list(lasso_coef.keys())\n",
    "lasso_y = list(lasso_coef.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing Ridge regularization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Now, we do the same as above, but we use L2 regularisation\n",
    "\n",
    "# Again, use alpha=0.001\n",
    "\n",
    "ridge_reg = Ridge(___)\n",
    "\n",
    "#Fit the model in the training data\n",
    "ridge_reg.fit(___)\n",
    "\n",
    "#Predict the model on the validation data\n",
    "y_val_pred = ridge_reg.predict(___)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing the MSE with Ridge regularization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "### edTest(test_mse) ###\n",
    "\n",
    "# Calculate the validation MSE & print it\n",
    "\n",
    "mse_ridge = mean_squared_error(___,___)\n",
    "print (\"Ridge validation MSE is\", mse_ridge)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtaining the coefficients of the predictors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Use the helper code below to make a dictionary of the predictors along with the coefficients associated with them \n",
    "\n",
    "ridge_coef = dict(zip(X.columns, np.transpose(ridge_reg.coef_))) \n",
    "\n",
    "#Ridge regularisation coefficient values to plot\n",
    "\n",
    "ridge_x = list(ridge_coef.keys())\n",
    "ridge_y = list(ridge_coef.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting the graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Use the helper code below to visualise your results\n",
    "\n",
    "plt.rcdefaults()\n",
    "\n",
    "plt.barh(lreg_x,lreg_y,1.0, align='edge',color=\"#D3B4B4\", label=\"Linear Regression\")\n",
    "plt.barh(lasso_x,lasso_y,0.75 ,align='edge',color=\"#81BDB2\",label = \"Lasso regularisation\")\n",
    "plt.barh(ridge_x,ridge_y,0.25 ,align='edge',color=\"#7E7EC0\", label=\"Ridge regularisation\")\n",
    "plt.grid(linewidth=0.2)\n",
    "plt.xlabel(\"Coefficient\")\n",
    "plt.ylabel(\"Predictors\")\n",
    "plt.legend(loc='best')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare the results of linear regression with that of lasso and ridge regularization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your answer here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### After marking, change the alpha values to 1, 10 and 1000. What happens to the coefficients when alpha increases?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your answer here"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}