{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Title :\n", "Exercise: Variation of Coefficients for Lasso and Ridge Regression\n", "\n", "## Description :\n", "\n", "The goal of this exercise is to understand the variation of the coefficients of predictors with varying values of regularization parameter in Lasso and Ridge regularization.\n", "\n", "Below is a sample plot for Ridge ($L_2$ regularization)\n", "\n", "\n", "\n", "## Data Description:\n", "\n", "## Instructions:\n", "\n", "- Read the dataset `bateria_train.csv` and assign the predictor and response variables.\n", "- The predictor is the 'Spreading factor' and the response variable is the 'Perc_population'\n", "- Use a maximum degree of 7 to make polynomial features and make a new predictor x_poly\n", "- Make a list of alpha values.\n", "- For each value of `$\\alpha$`:\n", " - Fit a multi-linear regression using $L_2$ regularization\n", " - Compute the coefficient of the predictors and store to the plot later\n", "- Make a plot of the coefficients along with the alpha values\n", "- Make a new alpha list as per the code in the exercise\n", "- Implement Lasso regularization by repeating the above steps for each value of alpha\n", "- Make another plot of the coefficients along with the new alpha values\n", "\n", "## Hints: \n", "\n", "np.linspace()\n", "Return evenly spaced numbers over a specified interval.\n", "\n", "np.transpose()\n", "Reverse or permute the axes of an array; returns the modified array.\n", "\n", "sklearn.PolynomialFeatures()\n", "Generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.\n", "\n", "sklearn.fit_transform()\n", "Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.\n", "\n", "sklearn.LinearRegression()\n", "LinearRegression fits a linear model.\n", "\n", "sklearn.fit()\n", "Fits the linear model to the training data.\n", "\n", "sklearn.predict()\n", "Predict using the linear modReturns the coefficient of the predictors in the model.\n", "\n", "mean_squared_error()\n", "Mean squared error regression loss.\n", "\n", "sklearn.coef_\n", "Returns the coefficients of the predictors.\n", "\n", "sklearn.Lasso()\n", "Linear Model trained with L1 prior as a regularizer.\n", "\n", "sklearn.Ridge()\n", "Linear least squares with L2 regularization.\n", "\n", "**Note:** This exercise is auto-graded and you can try multiple attempts. " ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries\n", "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.linear_model import Lasso\n", "from sklearn.linear_model import Ridge\n", "from sklearn.preprocessing import PolynomialFeatures\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Helper code to alter plot properties\n", "large = 22; med = 16; small = 10\n", "params = {'axes.titlesize': large,\n", " 'legend.fontsize': med,\n", " 'figure.figsize': (16, 10),\n", " 'axes.labelsize': med,\n", " 'axes.titlesize': med,\n", " 'axes.linewidth': 2,\n", " 'xtick.labelsize': med,\n", " 'ytick.labelsize': med,\n", " 'figure.titlesize': large}\n", "plt.style.use('seaborn-white')\n", "plt.rcParams.update(params)\n", "%matplotlib inline\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Read the file \"bacteria_train.csv\" as a dataframe\n", "df = pd.read_csv(\"bacteria_train.csv\")\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Take a quick look of your dataset\n", "df.head()\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Set the values of 'Spreading_factor' as the predictor \n", "x = df[[___]]\n", "\n", "# Set the values of 'Perc_population' as the response \n", "y = df[___]\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Select the degree of the polynomial\n", "maxdeg = 4\n", "\n", "# Compute the polynomial features on the data\n", "x_poly = PolynomialFeatures(___).fit_transform(___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Get a list of 1000 alpha values ranging from 10 to 120 \n", "# np.linspace is inclusive by default unlike arange\n", "alpha_list = np.linspace(___,___,___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "### edTest(test_ridge_fit) ###\n", "# Make an empty list called coeff_list to store the coefficients of each model\n", "coeff_list = []\n", "\n", "# Loop over all alpha values\n", "for i in alpha_list:\n", "\n", " # Initialize a Ridge regression object with the current alpha value\n", " # and set normalize as True\n", " ridge_reg = Ridge(alpha=___,normalize=___)\n", "\n", " # Fit on the transformed data\n", " ridge_reg.fit(___, ___)\n", " \n", " # Append the coeff_list with the coefficients of the trained model\n", " coeff_list.append(___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Take the transpose of the list to get the variation in the \n", "# coefficient values per degree\n", "trend = np.array(coeff_list).T\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Helper code to plot the variation of the coefficients as per the alpha value\n", "\n", "# Just adding some nice colors. make sure to comment this cell out if you plan to use degree more than 7\n", "colors = ['#5059E8','#9FC131FF','#D91C1C','#9400D3','#FF2F92','#336600','black']\n", "\n", "fig, ax = plt.subplots(figsize = (10,6))\n", "for i in range(maxdeg):\n", " ax.plot(alpha_list,np.abs(trend[i+1]),color=colors[i],alpha = 0.9,label = f'Degree {i+1}',lw=2.2)\n", " ax.legend(loc='best',fontsize=10)\n", " ax.set_xlabel(r'$\\alpha$ values', fontsize=20)\n", " ax.set_ylabel(r'$\\beta$ values', fontsize=20)\n", "fig.suptitle(r'Ridge ($L_2$) Regression')\n", "plt.show();\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare the results of Ridge regression with the Lasso variant" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Select a list of 1000 alpha values ranging from 1e-4 to 1e-1 \n", "alpha_list = np.linspace(___,___,___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "### edTest(test_lasso_fit) ###\n", "\n", "# Initialize a list called to store the alpha value of each model\n", "coeff_list = []\n", "\n", "# Loop over all the alpha values\n", "for i in alpha_list:\n", "\n", " # Initialize a Lasso regression model with the current alpha\n", " # Set normalize as True\n", " lasso_reg = Lasso(alpha=___, max_iter=250000, normalize=___)\n", "\n", " # Fit on the transformed data\n", " lasso_reg.fit(___, ___)\n", " \n", " # Append the coeff_list with the coefficients of the model\n", " coeff_list.append(___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Get the transpose of the list to get the variation in the \n", "# coefficient values per degree\n", "trend = np.array(coeff_list).T\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Helper code below to plot the variation of the coefficients as per the alpha value\n", "colors = ['#5059E8','#9FC131FF','#D91C1C','#9400D3','#FF2F92','#336600','black']\n", "\n", "fig, ax = plt.subplots(figsize = (10,6))\n", "for i in range(maxdeg):\n", " ax.plot(alpha_list,np.abs(trend[i+1]),color=colors[i],alpha = 0.9,label = f'Degree {i+1}',lw=2)\n", " ax.legend(loc='best',fontsize=10)\n", " ax.set_xlabel(r'$\\alpha$ values', fontsize=20)\n", " ax.set_ylabel(r'$\\beta$ values', fontsize=20)\n", "\n", "fig.suptitle(r'Lasso ($L_1$) Regression')\n", "plt.show();\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }