{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Title\n", "\n", "**Exercise: B.1 - Simple Multi-linear Regression**\n", "\n", "# Description\n", "The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change. \n", "\n", "# Instructions:\n", "- Read the file Advertisement.csv as a dataframe.\n", "- For each instance of the predictor combination, form a model. For example, if you have 2 predictors, A and B, you will end up getting 3 models - one with only A, one with only B and one with both A and B.\n", "- Split the data into train and test sets\n", "- Compute the MSE of each model \n", "- Print the Predictor - MSE value pair.\n", "\n", "\n", "# Hints:\n", "\n", "pd.read_csv(filename) : Returns a pandas dataframe containing the data and labels from the file data\n", "\n", "sklearn.preprocessing.normalize() : Scales input vectors individually to unit norm (vector length).\n", "\n", "np.interp() : Returns one-dimensional linear interpolation\n", "\n", "sklearn.train_test_split() : Splits the data into random train and test subsets\n", "\n", "sklearn.LinearRegression() : LinearRegression fits a linear model\n", "\n", "sklearn.fit() : Fits the linear model to the training data\n", "\n", "sklearn.predict() : Predict using the linear model.\n", "\n", "\n", "Note: This exercise is **auto-graded and you can try multiple attempts.**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#import necessary libraries\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.model_selection import train_test_split\n", "from sklearn import preprocessing\n", "from sklearn.metrics import mean_squared_error\n", "from prettytable import PrettyTable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading the dataset" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#Read the file \"Advertising.csv\"\n", "df = pd.read_csv(\"Advertising.csv\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TVRadioNewspaperSales
0230.137.869.222.1
144.539.345.110.4
217.245.969.39.3
3151.541.358.518.5
4180.810.858.412.9
\n", "
" ], "text/plain": [ " TV Radio Newspaper Sales\n", "0 230.1 37.8 69.2 22.1\n", "1 44.5 39.3 45.1 10.4\n", "2 17.2 45.9 69.3 9.3\n", "3 151.5 41.3 58.5 18.5\n", "4 180.8 10.8 58.4 12.9" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Take a quick look at the data to list all the predictors\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create different multi predictor models " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "### edTest(test_mse) ###\n", "#List to store the MSE values\n", "mse_list = []\n", "\n", "#List of all predictor combinations to fit the curve\n", "cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]\n", "\n", "for i in cols:\n", " #Set each of the predictors from the previous list as x\n", " x = df[___]\n", " \n", " \n", " #\"Sales\" column is the reponse variable\n", " y = df[___]\n", " \n", " \n", " #Splitting the data into train-test sets with 80% training data and 20% testing data. \n", " #Set random_state as 0\n", " xtrain, xtest, ytrain, ytest = train_test_split(___)\n", "\n", " #Create a LinearRegression object and fit the model\n", " lreg = LinearRegression()\n", " lreg.fit(___)\n", " \n", " #Predict the response variable for the test set\n", " y_pred= lreg.predict(___)\n", " \n", " #Compute the MSE\n", " MSE = mean_squared_error(___)\n", " \n", " #Append the MSE to the list\n", " mse_list.append(___)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display the MSE with predictor combinations" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "t = PrettyTable(['Predictors', 'MSE'])\n", "\n", "#Loop to display the predictor combinations along with the MSE value of the corresponding model\n", "for i in range(len(mse_list)):\n", " t.add_row([cols[i],mse_list[i]])\n", "\n", "print(t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comment on the trend of MSE values with changing predictor(s) combinations. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Your answer here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }