Key Word(s): Knn, Knn Regression, MSE, Data Plotting
Title :¶
Exercise: Linear Regression using Sklearn
Description :¶
The goal of this exercise is to use the sklearn
package to fit a Linear Regression on the previously used Advertising.csv
datafile and produce a plot like the one given below.
Data Description:¶
Instructions:¶
- Use
train_test_split()
function to split the dataset into training and testing sets - Use the
LinearRegression
function to make a model - Fit the model on the training set
- Predict on the testing set using the fit model
- Estimate the fit of the model using
mean_squared_error
function - Plot the dataset along with the predictions to visualize the fit
Hints:¶
pd.read_csv(filename) Returns a pandas dataframe containing the data and labels from the file data
sklearn.train_test_split() Splits the data into random train and test subsets
sklearn.LinearRegression() LinearRegression fits a linear model
sklearn.fit() Fits the linear model to the training data
sklearn.predict() Predict using the linear model
mean_squared_error() Computes the mean squared error regression loss
plt.plot() Plot y versus x as lines and/or markers
Note: This exercise is auto-graded, hence please remember to set all the parameters to the values mentioned in the scaffold before marking.
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
%matplotlib inline
# Read the data from the file "Advertising.csv"
df = pd.read_csv('Advertising.csv')
# Take a quick look at the data
df.head()
# Assign TV advertising as predictor variable 'x'
x = df[['TV']]
# Set the Sales column as the response variable 'y'
y = df['Sales']
# Split the dataset in train and test data with 80% training set
x_train, x_test, y_train, y_test = train_test_split(___,___,train_size=0.8)
# Initialize a Linear Regression model using Sklearn
model = LinearRegression()
# Fit the linear model on the train data
model.fit(___, ___)
# Peedict on the test data using the trained model
y_pred_test = model.predict(___)
### edTest(test_mse) ###
# Compute the MSE of the predicted test values
mse = mean_squared_error(___, ___)
# Print the computed MSE
print(f'The test MSE is {___}')
# Make a plot of the data along with the predicted linear regression
fig, ax = plt.subplots()
ax.scatter(x,y,label='data points')
# Plot the test data and the predicted output of test data
ax.plot(___,___,color='red',linewidth=2,label='model predictions')
ax.set_xlabel('Advertising')
ax.set_ylabel('Sales')
ax.legend()
⏸ How does your $MSE$ change when the size of the training set is change to 60% instead of 80%?
### edTest(test_chow1) ###
# Type your answer within in the quotes given
answer1 = '___'