{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Title :\n", "Exercise: Simple Multi-linear Regression\n", "\n", "## Description :\n", "The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change. \n", "\n", "\n", "\n", "## Data Description:\n", "\n", "## Instructions:\n", "\n", "- Read the file `Advertisement.csv` as a dataframe.\n", "- For each instance of the predictor combination, form a model. For example, if you have 2 predictors, A and B, you will end up getting 3 models - one with only A, one with only B, and one with both A and B.\n", "- Split the data into train and test sets.\n", "- Compute the MSE of each model.\n", "- Print the Predictor - MSE value pair\n", "\n", "## Hints: \n", "\n", "pd.read_csv(filename)\n", "Returns a pandas dataframe containing the data and labels from the file data.\n", "\n", "sklearn.preprocessing.normalize()\n", "Scales input vectors individually to unit norm (vector length).\n", "\n", "sklearn.model_selection.train_test_split()\n", "Splits the data into random train and test subsets.\n", "\n", "sklearn.linear_model.LinearRegression\n", "LinearRegression fits a linear model.\n", "\n", "sklearn.linear_model.LinearRegression.fit()\n", "Fits the linear model to the training data.\n", "\n", "sklearn.linear_model.LinearRegression.predict()\n", "Predict using the linear model.\n", "\n", "sklearn.metrics.mean_squared_error()\n", "Computes the mean squared error regression loss\n", "\n", "**Note:** This exercise is auto-graded and you can try multiple attempts. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn import preprocessing\n", "from prettytable import PrettyTable\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "%matplotlib inline\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading the dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Read the file \"Advertising.csv\"\n", "df = pd.read_csv(\"Advertising.csv\")\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TVRadioNewspaperSales
0230.137.869.222.1
144.539.345.110.4
217.245.969.39.3
3151.541.358.518.5
4180.810.858.412.9
\n", "
" ], "text/plain": [ " TV Radio Newspaper Sales\n", "0 230.1 37.8 69.2 22.1\n", "1 44.5 39.3 45.1 10.4\n", "2 17.2 45.9 69.3 9.3\n", "3 151.5 41.3 58.5 18.5\n", "4 180.8 10.8 58.4 12.9" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Take a quick look at the data to list all the predictors\n", "df.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create different multi predictor models " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "### edTest(test_mse) ###\n", "\n", "# Initialize a list to store the MSE values\n", "mse_list = []\n", "\n", "# List of all predictor combinations to fit the curve\n", "cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]\n", "\n", "# Loop over all the predictor combinations \n", "for i in cols:\n", "\n", " # Set each of the predictors from the previous list as x\n", " x = df[i]\n", " \n", " \n", " # Set the \"Sales\" column as the reponse variable\n", " y = df[\"Sales\"]\n", " \n", " \n", " # Split the data into train-test sets with 80% training data and 20% testing data. \n", " # Set random_state as 0\n", " x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)\n", "\n", " # Initialize a Linear Regression model\n", " lreg = LinearRegression()\n", "\n", " # Fit the linear model on the train data\n", " lreg.fit(x_train, y_train)\n", " \n", " # Predict the response variable for the test set using the trained model\n", " y_pred= lreg.predict(x_test)\n", " \n", " # Compute the MSE for the test data\n", " MSE = mean_squared_error(y_test, y_pred)\n", " \n", " # Append the computed MSE to the list\n", " mse_list.append(MSE)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display the MSE with predictor combinations" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+------------------------------+-------------------+\n", "| Predictors | MSE |\n", "+------------------------------+-------------------+\n", "| ['TV'] | 10.18618193453022 |\n", "| ['Radio'] | 24.23723303713214 |\n", "| ['Newspaper'] | 32.13714634300907 |\n", "| ['TV', 'Radio'] | 4.391429763581883 |\n", "| ['TV', 'Newspaper'] | 8.687682675690592 |\n", "| ['Radio', 'Newspaper'] | 24.78339548293816 |\n", "| ['TV', 'Radio', 'Newspaper'] | 4.402118291449686 |\n", "+------------------------------+-------------------+\n" ] } ], "source": [ "# Helper code to display the MSE for each predictor combination\n", "t = PrettyTable(['Predictors', 'MSE'])\n", "\n", "for i in range(len(mse_list)):\n", " t.add_row([cols[i],mse_list[i]])\n", "\n", "print(t)\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }