{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: Variation of Coefficients for Lasso and Ridge Regression\n",
"\n",
"## Description :\n",
"\n",
"The goal of this exercise is to understand the variation of the coefficients of predictors with varying values of regularization parameter in Lasso and Ridge regularization.\n",
"\n",
"Below is a sample plot for Ridge ($L_2$ regularization)\n",
"\n",
"
\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"- Read the dataset `bateria_train.csv` and assign the predictor and response variables.\n",
"- The predictor is the 'Spreading factor' and the response variable is the 'Perc_population'\n",
"- Use a maximum degree of 7 to make polynomial features and make a new predictor x_poly\n",
"- Make a list of alpha values.\n",
"- For each value of `$\\alpha$`:\n",
" - Fit a multi-linear regression using $L_2$ regularization\n",
" - Compute the coefficient of the predictors and store to the plot later\n",
"- Make a plot of the coefficients along with the alpha values\n",
"- Make a new alpha list as per the code in the exercise\n",
"- Implement Lasso regularization by repeating the above steps for each value of alpha\n",
"- Make another plot of the coefficients along with the new alpha values\n",
"\n",
"## Hints: \n",
"\n",
"np.linspace()\n",
"Return evenly spaced numbers over a specified interval.\n",
"\n",
"np.transpose()\n",
"Reverse or permute the axes of an array; returns the modified array.\n",
"\n",
"sklearn.PolynomialFeatures()\n",
"Generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.\n",
"\n",
"sklearn.fit_transform()\n",
"Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.\n",
"\n",
"sklearn.LinearRegression()\n",
"LinearRegression fits a linear model.\n",
"\n",
"sklearn.fit()\n",
"Fits the linear model to the training data.\n",
"\n",
"sklearn.predict()\n",
"Predict using the linear modReturns the coefficient of the predictors in the model.\n",
"\n",
"mean_squared_error()\n",
"Mean squared error regression loss.\n",
"\n",
"sklearn.coef_\n",
"Returns the coefficients of the predictors.\n",
"\n",
"sklearn.Lasso()\n",
"Linear Model trained with L1 prior as a regularizer.\n",
"\n",
"sklearn.Ridge()\n",
"Linear least squares with L2 regularization.\n",
"\n",
"**Note:** This exercise is auto-graded and you can try multiple attempts. "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.linear_model import Lasso\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.preprocessing import PolynomialFeatures\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Helper code to alter plot properties\n",
"large = 22; med = 16; small = 10\n",
"params = {'axes.titlesize': large,\n",
" 'legend.fontsize': med,\n",
" 'figure.figsize': (16, 10),\n",
" 'axes.labelsize': med,\n",
" 'axes.titlesize': med,\n",
" 'axes.linewidth': 2,\n",
" 'xtick.labelsize': med,\n",
" 'ytick.labelsize': med,\n",
" 'figure.titlesize': large}\n",
"plt.style.use('seaborn-white')\n",
"plt.rcParams.update(params)\n",
"%matplotlib inline\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Read the file \"bacteria_train.csv\" as a dataframe\n",
"df = pd.read_csv(\"bacteria_train.csv\")\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Take a quick look of your dataset\n",
"df.head()\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Set the values of 'Spreading_factor' as the predictor \n",
"x = df[[___]]\n",
"\n",
"# Set the values of 'Perc_population' as the response \n",
"y = df[___]\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Select the degree of the polynomial\n",
"maxdeg = 4\n",
"\n",
"# Compute the polynomial features on the data\n",
"x_poly = PolynomialFeatures(___).fit_transform(___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Get a list of 1000 alpha values ranging from 10 to 120 \n",
"# np.linspace is inclusive by default unlike arange\n",
"alpha_list = np.linspace(___,___,___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_ridge_fit) ###\n",
"# Make an empty list called coeff_list to store the coefficients of each model\n",
"coeff_list = []\n",
"\n",
"# Loop over all alpha values\n",
"for i in alpha_list:\n",
"\n",
" # Initialize a Ridge regression object with the current alpha value\n",
" # and set normalize as True\n",
" ridge_reg = Ridge(alpha=___,normalize=___)\n",
"\n",
" # Fit on the transformed data\n",
" ridge_reg.fit(___, ___)\n",
" \n",
" # Append the coeff_list with the coefficients of the trained model\n",
" coeff_list.append(___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Take the transpose of the list to get the variation in the \n",
"# coefficient values per degree\n",
"trend = np.array(coeff_list).T\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Helper code to plot the variation of the coefficients as per the alpha value\n",
"\n",
"# Just adding some nice colors. make sure to comment this cell out if you plan to use degree more than 7\n",
"colors = ['#5059E8','#9FC131FF','#D91C1C','#9400D3','#FF2F92','#336600','black']\n",
"\n",
"fig, ax = plt.subplots(figsize = (10,6))\n",
"for i in range(maxdeg):\n",
" ax.plot(alpha_list,np.abs(trend[i+1]),color=colors[i],alpha = 0.9,label = f'Degree {i+1}',lw=2.2)\n",
" ax.legend(loc='best',fontsize=10)\n",
" ax.set_xlabel(r'$\\alpha$ values', fontsize=20)\n",
" ax.set_ylabel(r'$\\beta$ values', fontsize=20)\n",
"fig.suptitle(r'Ridge ($L_2$) Regression')\n",
"plt.show();\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compare the results of Ridge regression with the Lasso variant"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Select a list of 1000 alpha values ranging from 1e-4 to 1e-1 \n",
"alpha_list = np.linspace(___,___,___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_lasso_fit) ###\n",
"\n",
"# Initialize a list called to store the alpha value of each model\n",
"coeff_list = []\n",
"\n",
"# Loop over all the alpha values\n",
"for i in alpha_list:\n",
"\n",
" # Initialize a Lasso regression model with the current alpha\n",
" # Set normalize as True\n",
" lasso_reg = Lasso(alpha=___, max_iter=250000, normalize=___)\n",
"\n",
" # Fit on the transformed data\n",
" lasso_reg.fit(___, ___)\n",
" \n",
" # Append the coeff_list with the coefficients of the model\n",
" coeff_list.append(___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Get the transpose of the list to get the variation in the \n",
"# coefficient values per degree\n",
"trend = np.array(coeff_list).T\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Helper code below to plot the variation of the coefficients as per the alpha value\n",
"colors = ['#5059E8','#9FC131FF','#D91C1C','#9400D3','#FF2F92','#336600','black']\n",
"\n",
"fig, ax = plt.subplots(figsize = (10,6))\n",
"for i in range(maxdeg):\n",
" ax.plot(alpha_list,np.abs(trend[i+1]),color=colors[i],alpha = 0.9,label = f'Degree {i+1}',lw=2)\n",
" ax.legend(loc='best',fontsize=10)\n",
" ax.set_xlabel(r'$\\alpha$ values', fontsize=20)\n",
" ax.set_ylabel(r'$\\beta$ values', fontsize=20)\n",
"\n",
"fig.suptitle(r'Lasso ($L_1$) Regression')\n",
"plt.show();\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}