{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: 2 - Simple Lasso and Ridge Regularization**\n",
"\n",
"# Description\n",
"\n",
"The aim of this exercise is to understand **Lasso and Ridge regularization**.\n",
"\n",
"Plot Predictor vs Coefficient as a horizontal bar chart. The graph may look similar to the one given below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions:\n",
"- Read the dataset and assign the predictor and response variables.\n",
"- Split the dataset into train and validation sets\n",
"- Fit a multi-linear regression model\n",
"- Compute the validation MSE of the model\n",
"- Compute the coefficient of the predictors and store to the plot later\n",
"- Implement Lasso regularization by specifying an alpha value. Repeat steps 4 and 5\n",
"- Implement Ridge regularization by specifying the same alpha value. Repeat steps 4 and 5\n",
"- Plot the coefficient of all the 3 models in one graph as shown above\n",
"\n",
"# Hints:\n",
"np.transpose() : Reverse or permute the axes of an array; returns the modified array\n",
"\n",
"sklearn.normalize() : Scales input vectors individually to the unit norm (vector length)\n",
"\n",
"sklearn.train_test_split() : Splits the data into random train and test subsets\n",
"\n",
"sklearn.PolynomialFeatures() : Generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree\n",
"\n",
"sklearn.fit_transform() : Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X\n",
"\n",
"sklearn.LinearRegression() : LinearRegression fits a linear model\n",
"\n",
"sklearn.fit() : Fits the linear model to the training data\n",
"\n",
"sklearn.predict() : Predict using the linear modReturns the coefficient of the predictors in the model.\n",
"\n",
"mean_squared_error() : Mean squared error regression loss\n",
"\n",
"sklearn.coef_ : Returns the coefficients of the predictors\n",
"\n",
"plt.subplots() : Create a figure and a set of subplots\n",
"\n",
"ax.barh() : Make a horizontal bar plot\n",
"\n",
"ax.set_xlim() : Sets the x-axis view limits\n",
"\n",
"sklearn.Lasso() : Linear Model trained with L1 prior as a regularizer\n",
"\n",
"sklearn.Ridge() : Linear least squares with L2 regularization\n",
"\n",
"zip() : Makes an iterator that aggregates elements from each of the iterables.\n",
"\n",
"**Note: This exercise is auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Import libraries\n",
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn import preprocessing\n",
"from sklearn.linear_model import Lasso\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import PolynomialFeatures"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reading the dataset"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Read the file \"Boston_housing.csv\" as a dataframe\n",
"\n",
"df = pd.read_csv(\"Boston_housing.csv\")\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predictors & Response variables\n",
"\n",
"#### Select the 'medv' column as response variable and the rest of the columns as predictors.\n",
"\n",
"As such, all the following columns are predictors:\n",
"- crim\n",
"- indus\n",
"- nox\n",
"- rm\n",
"- age\n",
"- dis\n",
"- rad\n",
"- tax\n",
"- ptratio\n",
"- black\n",
"- lstat\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [],
"source": [
"# Select a subdataframe of predictors mentioned above\n",
"\n",
"X = df[___]\n",
"\n",
"# Normalize the values of the dataframe \n",
"\n",
"X_norm = preprocessing.normalize(___)\n",
"\n",
"# Select medv as the response variable\n",
"\n",
"y = df[___]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Split the dataset into train and validation sets\n",
"\n",
"Keep the test size as 30% of the dataset, and use ```random_state```=31"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_random) ###\n",
"# Split the data into train and validation sets\n",
"\n",
"X_train, X_val, y_train, y_val = train_test_split(___)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multi-linear Regression Analysis"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"#Fit a linear regression model on the training data\n",
"\n",
"lreg = LinearRegression()\n",
"lreg.fit(___)\n",
"\n",
"# Predict on the validation set\n",
"\n",
"y_val_pred = lreg.predict(___)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Computing the MSE for Multi-Linear Regression"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Use the mean_squared_error function to compute the validation mse\n",
"\n",
"mse = mean_squared_error(___,___)\n",
"\n",
"# print the MSE value\n",
"\n",
"print (\"Multi-linear regression validation MSE is\", mse)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Obtaining the coefficients of the predictors"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"#make a dictionary of the coefficients along with the predictors as keys\n",
"\n",
"lreg_coef = dict(zip(X.columns, np.transpose(lreg.coef_)))\n",
"\n",
"#Linear regression coefficient values to plot\n",
"\n",
"lreg_x = list(lreg_coef.keys())\n",
"lreg_y = list(lreg_coef.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Implementing Lasso regularization"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Now, you will implement the lasso regularisation\n",
"\n",
"# Use alpha = 0.001\n",
"\n",
"lasso_reg = Lasso(___)\n",
"\n",
"#Fit on training data\n",
"\n",
"lasso_reg.fit(___)\n",
"\n",
"#Make a prediction on the validation data using the above trained model\n",
"\n",
"y_val_pred =lasso_reg.predict(___)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Computing the MSE with Lasso regularization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Again, calculate the validation MSE & print it\n",
"\n",
"mse_lasso = mean_squared_error(___,___)\n",
"\n",
"print (\"Lasso validation MSE is\", mse_lasso)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Obtaining the coefficients of the predictors"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Use the helper code below to make a dictionary of the predictors along with the coefficients associated with them\n",
"\n",
"lasso_coef = dict(zip(X.columns, np.transpose(lasso_reg.coef_))) \n",
"\n",
"#Lasso regularisation coefficient values to plot\n",
"\n",
"lasso_x = list(lasso_coef.keys())\n",
"lasso_y = list(lasso_coef.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Implementing Ridge regularization"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Now, we do the same as above, but we use L2 regularisation\n",
"\n",
"# Again, use alpha=0.001\n",
"\n",
"ridge_reg = Ridge(___)\n",
"\n",
"#Fit the model in the training data\n",
"ridge_reg.fit(___)\n",
"\n",
"#Predict the model on the validation data\n",
"y_val_pred = ridge_reg.predict(___)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Computing the MSE with Ridge regularization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"### edTest(test_mse) ###\n",
"\n",
"# Calculate the validation MSE & print it\n",
"\n",
"mse_ridge = mean_squared_error(___,___)\n",
"print (\"Ridge validation MSE is\", mse_ridge)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Obtaining the coefficients of the predictors"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Use the helper code below to make a dictionary of the predictors along with the coefficients associated with them \n",
"\n",
"ridge_coef = dict(zip(X.columns, np.transpose(ridge_reg.coef_))) \n",
"\n",
"#Ridge regularisation coefficient values to plot\n",
"\n",
"ridge_x = list(ridge_coef.keys())\n",
"ridge_y = list(ridge_coef.values())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting the graph"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [],
"source": [
"# Use the helper code below to visualise your results\n",
"\n",
"plt.rcdefaults()\n",
"\n",
"plt.barh(lreg_x,lreg_y,1.0, align='edge',color=\"#D3B4B4\", label=\"Linear Regression\")\n",
"plt.barh(lasso_x,lasso_y,0.75 ,align='edge',color=\"#81BDB2\",label = \"Lasso regularisation\")\n",
"plt.barh(ridge_x,ridge_y,0.25 ,align='edge',color=\"#7E7EC0\", label=\"Ridge regularisation\")\n",
"plt.grid(linewidth=0.2)\n",
"plt.xlabel(\"Coefficient\")\n",
"plt.ylabel(\"Predictors\")\n",
"plt.legend(loc='best')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compare the results of linear regression with that of lasso and ridge regularization."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your answer here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### After marking, change the alpha values to 1, 10 and 1000. What happens to the coefficients when alpha increases?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your answer here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}