Key Word(s): Linear Regression
Title¶
Exercise: B.1 - Simple Multi-linear Regression
Description¶
The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change.
Instructions:¶
- Read the file Advertisement.csv as a dataframe.
- For each instance of the predictor combination, form a model. For example, if you have 2 predictors, A and B, you will end up getting 3 models - one with only A, one with only B and one with both A and B.
- Split the data into train and test sets
- Compute the MSE of each model
- Print the Predictor - MSE value pair.
Hints:¶
pd.read_csv(filename) : Returns a pandas dataframe containing the data and labels from the file data
sklearn.preprocessing.normalize() : Scales input vectors individually to unit norm (vector length).
np.interp() : Returns one-dimensional linear interpolation
sklearn.train_test_split() : Splits the data into random train and test subsets
sklearn.LinearRegression() : LinearRegression fits a linear model
sklearn.fit() : Fits the linear model to the training data
sklearn.predict() : Predict using the linear model.
Note: This exercise is auto-graded and you can try multiple attempts.
#import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
from prettytable import PrettyTable
Reading the dataset¶
#Read the file "Advertising.csv"
df = pd.read_csv("Advertising.csv")
#Take a quick look at the data to list all the predictors
df.head()
Create different multi predictor models¶
### edTest(test_mse) ###
#List to store the MSE values
mse_list = []
#List of all predictor combinations to fit the curve
cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]
for i in cols:
#Set each of the predictors from the previous list as x
x = df[___]
#"Sales" column is the reponse variable
y = df[___]
#Splitting the data into train-test sets with 80% training data and 20% testing data.
#Set random_state as 0
xtrain, xtest, ytrain, ytest = train_test_split(___)
#Create a LinearRegression object and fit the model
lreg = LinearRegression()
lreg.fit(___)
#Predict the response variable for the test set
y_pred= lreg.predict(___)
#Compute the MSE
MSE = mean_squared_error(___)
#Append the MSE to the list
mse_list.append(___)
Display the MSE with predictor combinations¶
t = PrettyTable(['Predictors', 'MSE'])
#Loop to display the predictor combinations along with the MSE value of the corresponding model
for i in range(len(mse_list)):
t.add_row([cols[i],mse_list[i]])
print(t)