{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Title\n", "\n", "**Exercise: 2 - Simple Lasso and Ridge Regularization**\n", "\n", "# Description\n", "\n", "The aim of this exercise is to understand **Lasso and Ridge regularization**.\n", "\n", "Plot Predictor vs Coefficient as a horizontal bar chart. The graph may look similar to the one given below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Instructions:\n", "- Read the dataset and assign the predictor and response variables.\n", "- Split the dataset into train and validation sets\n", "- Fit a multi-linear regression model\n", "- Compute the validation MSE of the model\n", "- Compute the coefficient of the predictors and store to the plot later\n", "- Implement Lasso regularization by specifying an alpha value. Repeat steps 4 and 5\n", "- Implement Ridge regularization by specifying the same alpha value. Repeat steps 4 and 5\n", "- Plot the coefficient of all the 3 models in one graph as shown above\n", "\n", "# Hints:\n", "np.transpose() : Reverse or permute the axes of an array; returns the modified array\n", "\n", "sklearn.normalize() : Scales input vectors individually to the unit norm (vector length)\n", "\n", "sklearn.train_test_split() : Splits the data into random train and test subsets\n", "\n", "sklearn.PolynomialFeatures() : Generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree\n", "\n", "sklearn.fit_transform() : Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X\n", "\n", "sklearn.LinearRegression() : LinearRegression fits a linear model\n", "\n", "sklearn.fit() : Fits the linear model to the training data\n", "\n", "sklearn.predict() : Predict using the linear modReturns the coefficient of the predictors in the model.\n", "\n", "mean_squared_error() : Mean squared error regression loss\n", "\n", "sklearn.coef_ : Returns the coefficients of the predictors\n", "\n", "plt.subplots() : Create a figure and a set of subplots\n", "\n", "ax.barh() : Make a horizontal bar plot\n", "\n", "ax.set_xlim() : Sets the x-axis view limits\n", "\n", "sklearn.Lasso() : Linear Model trained with L1 prior as a regularizer\n", "\n", "sklearn.Ridge() : Linear least squares with L2 regularization\n", "\n", "zip() : Makes an iterator that aggregates elements from each of the iterables.\n", "\n", "**Note: This exercise is auto-graded and you can try multiple attempts.**" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Import libraries\n", "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn import preprocessing\n", "from sklearn.linear_model import Lasso\n", "from sklearn.linear_model import Ridge\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import PolynomialFeatures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading the dataset" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Read the file \"Boston_housing.csv\" as a dataframe\n", "\n", "df = pd.read_csv(\"Boston_housing.csv\")\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predictors & Response variables\n", "\n", "#### Select the 'medv' column as response variable and the rest of the columns as predictors.\n", "\n", "As such, all the following columns are predictors:\n", "- crim\n", "- indus\n", "- nox\n", "- rm\n", "- age\n", "- dis\n", "- rad\n", "- tax\n", "- ptratio\n", "- black\n", "- lstat\n", "\n" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "# Select a subdataframe of predictors mentioned above\n", "\n", "X = df[___]\n", "\n", "# Normalize the values of the dataframe \n", "\n", "X_norm = preprocessing.normalize(___)\n", "\n", "# Select medv as the response variable\n", "\n", "y = df[___]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split the dataset into train and validation sets\n", "\n", "Keep the test size as 30% of the dataset, and use ```random_state```=31" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "### edTest(test_random) ###\n", "# Split the data into train and validation sets\n", "\n", "X_train, X_val, y_train, y_val = train_test_split(___)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multi-linear Regression Analysis" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "#Fit a linear regression model on the training data\n", "\n", "lreg = LinearRegression()\n", "lreg.fit(___)\n", "\n", "# Predict on the validation set\n", "\n", "y_val_pred = lreg.predict(___)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the MSE for Multi-Linear Regression" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Use the mean_squared_error function to compute the validation mse\n", "\n", "mse = mean_squared_error(___,___)\n", "\n", "# print the MSE value\n", "\n", "print (\"Multi-linear regression validation MSE is\", mse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Obtaining the coefficients of the predictors" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "#make a dictionary of the coefficients along with the predictors as keys\n", "\n", "lreg_coef = dict(zip(X.columns, np.transpose(lreg.coef_)))\n", "\n", "#Linear regression coefficient values to plot\n", "\n", "lreg_x = list(lreg_coef.keys())\n", "lreg_y = list(lreg_coef.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Implementing Lasso regularization" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Now, you will implement the lasso regularisation\n", "\n", "# Use alpha = 0.001\n", "\n", "lasso_reg = Lasso(___)\n", "\n", "#Fit on training data\n", "\n", "lasso_reg.fit(___)\n", "\n", "#Make a prediction on the validation data using the above trained model\n", "\n", "y_val_pred =lasso_reg.predict(___)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the MSE with Lasso regularization" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Again, calculate the validation MSE & print it\n", "\n", "mse_lasso = mean_squared_error(___,___)\n", "\n", "print (\"Lasso validation MSE is\", mse_lasso)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Obtaining the coefficients of the predictors" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Use the helper code below to make a dictionary of the predictors along with the coefficients associated with them\n", "\n", "lasso_coef = dict(zip(X.columns, np.transpose(lasso_reg.coef_))) \n", "\n", "#Lasso regularisation coefficient values to plot\n", "\n", "lasso_x = list(lasso_coef.keys())\n", "lasso_y = list(lasso_coef.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Implementing Ridge regularization" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Now, we do the same as above, but we use L2 regularisation\n", "\n", "# Again, use alpha=0.001\n", "\n", "ridge_reg = Ridge(___)\n", "\n", "#Fit the model in the training data\n", "ridge_reg.fit(___)\n", "\n", "#Predict the model on the validation data\n", "y_val_pred = ridge_reg.predict(___)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the MSE with Ridge regularization" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "### edTest(test_mse) ###\n", "\n", "# Calculate the validation MSE & print it\n", "\n", "mse_ridge = mean_squared_error(___,___)\n", "print (\"Ridge validation MSE is\", mse_ridge)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Obtaining the coefficients of the predictors" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Use the helper code below to make a dictionary of the predictors along with the coefficients associated with them \n", "\n", "ridge_coef = dict(zip(X.columns, np.transpose(ridge_reg.coef_))) \n", "\n", "#Ridge regularisation coefficient values to plot\n", "\n", "ridge_x = list(ridge_coef.keys())\n", "ridge_y = list(ridge_coef.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "# Use the helper code below to visualise your results\n", "\n", "plt.rcdefaults()\n", "\n", "plt.barh(lreg_x,lreg_y,1.0, align='edge',color=\"#D3B4B4\", label=\"Linear Regression\")\n", "plt.barh(lasso_x,lasso_y,0.75 ,align='edge',color=\"#81BDB2\",label = \"Lasso regularisation\")\n", "plt.barh(ridge_x,ridge_y,0.25 ,align='edge',color=\"#7E7EC0\", label=\"Ridge regularisation\")\n", "plt.grid(linewidth=0.2)\n", "plt.xlabel(\"Coefficient\")\n", "plt.ylabel(\"Predictors\")\n", "plt.legend(loc='best')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare the results of linear regression with that of lasso and ridge regularization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your answer here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### After marking, change the alpha values to 1, 10 and 1000. What happens to the coefficients when alpha increases?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your answer here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }