Key Word(s): Linear Regression


Title

Exercise: B.1 - Simple Multi-linear Regression

Description

The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change.

Instructions:

  • Read the file Advertisement.csv as a dataframe.
  • For each instance of the predictor combination, form a model. For example, if you have 2 predictors, A and B, you will end up getting 3 models - one with only A, one with only B and one with both A and B.
  • Split the data into train and test sets
  • Compute the MSE of each model
  • Print the Predictor - MSE value pair.

Hints:

pd.read_csv(filename) : Returns a pandas dataframe containing the data and labels from the file data

sklearn.preprocessing.normalize() : Scales input vectors individually to unit norm (vector length).

np.interp() : Returns one-dimensional linear interpolation

sklearn.train_test_split() : Splits the data into random train and test subsets

sklearn.LinearRegression() : LinearRegression fits a linear model

sklearn.fit() : Fits the linear model to the training data

sklearn.predict() : Predict using the linear model.

Note: This exercise is auto-graded and you can try multiple attempts.

In [3]:
#import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
from prettytable import PrettyTable

Reading the dataset

In [5]:
#Read the file "Advertising.csv"
df = pd.read_csv("Advertising.csv")
In [6]:
#Take a quick look at the data to list all the predictors
df.head()
Out[6]:
TV Radio Newspaper Sales
0 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 9.3
3 151.5 41.3 58.5 18.5
4 180.8 10.8 58.4 12.9

Create different multi predictor models

In [18]:
### edTest(test_mse) ###
#List to store the MSE values
mse_list = []

#List of all predictor combinations to fit the curve
cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]

for i in cols:
    #Set each of the predictors from the previous list as x
    x = df[___]
    
    
    #"Sales" column is the reponse variable
    y = df[___]
    
   
    #Splitting the data into train-test sets with 80% training data and 20% testing data. 
    #Set random_state as 0
    xtrain, xtest, ytrain, ytest = train_test_split(___)

    #Create a LinearRegression object and fit the model
    lreg = LinearRegression()
    lreg.fit(___)
    
    #Predict the response variable for the test set
    y_pred= lreg.predict(___)
    
    #Compute the MSE
    MSE = mean_squared_error(___)
    
    #Append the MSE to the list
    mse_list.append(___)

Display the MSE with predictor combinations

In [20]:
t = PrettyTable(['Predictors', 'MSE'])

#Loop to display the predictor combinations along with the MSE value of the corresponding model
for i in range(len(mse_list)):
    t.add_row([cols[i],mse_list[i]])

print(t)

Comment on the trend of MSE values with changing predictor(s) combinations.

Your answer here