Key Word(s): Linear Regression


Exercise: B.1 - Simple Multi-linear Regression


The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change.


  • Read the file Advertisement.csv as a dataframe.
  • For each instance of the predictor combination, form a model. For example, if you have 2 predictors, A and B, you will end up getting 3 models - one with only A, one with only B and one with both A and B.
  • Split the data into train and test sets
  • Compute the MSE of each model
  • Print the Predictor - MSE value pair.


pd.read_csv(filename) : Returns a pandas dataframe containing the data and labels from the file data

sklearn.preprocessing.normalize() : Scales input vectors individually to unit norm (vector length).

np.interp() : Returns one-dimensional linear interpolation

sklearn.train_test_split() : Splits the data into random train and test subsets

sklearn.LinearRegression() : LinearRegression fits a linear model : Fits the linear model to the training data

sklearn.predict() : Predict using the linear model.

Note: This exercise is auto-graded and you can try multiple attempts.

In [3]:
#import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
from prettytable import PrettyTable

Reading the dataset

In [5]:
#Read the file "Advertising.csv"
df = pd.read_csv("Advertising.csv")
In [6]:
#Take a quick look at the data to list all the predictors
TV Radio Newspaper Sales
0 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 9.3
3 151.5 41.3 58.5 18.5
4 180.8 10.8 58.4 12.9

Create different multi predictor models

In [18]:
### edTest(test_mse) ###
#List to store the MSE values
mse_list = []

#List of all predictor combinations to fit the curve
cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]

for i in cols:
    #Set each of the predictors from the previous list as x
    x = df[___]
    #"Sales" column is the reponse variable
    y = df[___]
    #Splitting the data into train-test sets with 80% training data and 20% testing data. 
    #Set random_state as 0
    xtrain, xtest, ytrain, ytest = train_test_split(___)

    #Create a LinearRegression object and fit the model
    lreg = LinearRegression()
    #Predict the response variable for the test set
    y_pred= lreg.predict(___)
    #Compute the MSE
    MSE = mean_squared_error(___)
    #Append the MSE to the list

Display the MSE with predictor combinations

In [20]:
t = PrettyTable(['Predictors', 'MSE'])

#Loop to display the predictor combinations along with the MSE value of the corresponding model
for i in range(len(mse_list)):


Comment on the trend of MSE values with changing predictor(s) combinations.

Your answer here