Key Word(s): Linear Regression


Title

Exercise: A.1 - Guesstimate the β values

Description

The goal of this exercise is to guess a model based on the plot below and also calculate the MSE for your guess model.

Instructions:

We are trying to predict sales as a function of the advertising budget for TV using the data. To do so we need 1) a model and 2) a method to estimate how good the model is.

  • Guess the values of the coefficients $\beta_0$ and $\beta_1$ by visually inspecting the graph above;
  • Plot your model's prediction (use the formula of a simple linear regression, no package allowed) ;
  • Change the values of the coefficients $\beta_0$ and $\beta_1$ to improve the fit;
  • Calculate the Mean Squared Error (MSE) for the model.

Hints:

  • Recall the formula for the linear regression model $\hat{y}= \beta_0 + \beta_1*x$
  • Recall the formula for Mean Squared Error $MSE =\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y})^2$
  • Guess $\beta_0$ by observing the intercept of your model and $\beta_1$ looking at the slope

np.mean() : Computes the arithmetic mean along the specified axis

plt.plot() : Plots x versus y as lines and/or markers

plt.xlabel() : Sets the label for the x-axis.

plt.ylabel() : Sets the label for the y-axis.

plt.legend() : Places a legend on the axes

Note: This exercise is auto-graded and you can try multiple attempts.

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Reading the dataset

In [11]:
# Data set used in this exercise :Advertising.csv
data_filename = 'Advertising.csv'

# Read the data using pandas libraries
df = pd.read_csv(data_filename)
In [3]:
# Create a new dataframe called `df_new` witch the columns 'TV' and 'sales'
df_new = df[['TV', 'sales']]
In [4]:
# Plot the data

plt.plot(df_new.TV, df_new.sales, '*', label='data')
plt.xlabel('TV')
plt.ylabel('Sales')
plt.legend()
Out[4]:

Beta Estimation

In [ ]:
### edTest(test_betas) ###
# Estimate beta0 by observing the value of y when x = 0
beta0 = ___

# Estimate beta1 - Check the slope for guidance
beta1 = ___
In [ ]:
# Calculate prediction of x using beta0 and beta1
y_predict = ___

Plotting the graph

In [ ]:
# Plot the predicted values as well as the data
plt.plot(df_new.TV, df_new.sales, '*', label='data')
plt.plot(df_new.TV, y_predict, label='model')
plt.xlabel('TV')
plt.ylabel('Sales')
plt.legend()

MSE Computation

In [ ]:
### edTest(test_mse) ###
# Calculate the MSE
MSE = ___

# Print the results
print("My MSE is: {0}".format(MSE))