Key Word(s): Knn, Knn Regression, MSE, Data Plotting
Title :¶
Description :¶
The aim of this exercise is to plot TV Ads vs Sales based on the Advertisement dataset which should look similar to the graph given below.
Data Description:¶
Instructions:¶
- Read the Advertisement data and view the top rows of the dataframe to get an understanding of the data and the columns.
- Select the first 7 observations and the columns
TV
andSales
to make a new data frame. - Create a scatter plot of the new data frame
TV
budget vsSales
.
Hints:¶
pd.read_csv(filename) Returns a pandas dataframe containing the data and labels from the file data
df.iloc[] Returns a subset of the dataframe that is contained in the row range passed as the argument
np.linspace() Returns evenly spaced numbers over a specified interval
df.head() Returns the first 5 rows of the dataframe with the column names
plt.scatter() A scatter plot of y vs. x with varying marker size and/or color
plt.xlabel() This is used to specify the text to be displayed as the label for the x-axis
plt.ylabel() This is used to specify the text to be displayed as the label for the y-axis
Note: This exercise is auto-graded and you can try multiple attempts.
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Reading the Advertisement dataset¶
# "Advertising.csv" containts the data set used in this exercise
data_filename = 'Advertising.csv'
# Read the file "Advertising.csv" file using the pandas library
df = pd.read_csv("Advertising.csv")
# Get a quick look of the data
df.describe()
TV | Radio | Newspaper | Sales | |
---|---|---|---|---|
count | 200.000000 | 200.000000 | 200.000000 | 200.000000 |
mean | 147.042500 | 23.264000 | 30.554000 | 14.022500 |
std | 85.854236 | 14.846809 | 21.778621 | 5.217457 |
min | 0.700000 | 0.000000 | 0.300000 | 1.600000 |
25% | 74.375000 | 9.975000 | 12.750000 | 10.375000 |
50% | 149.750000 | 22.900000 | 25.750000 | 12.900000 |
75% | 218.825000 | 36.525000 | 45.100000 | 17.400000 |
max | 296.400000 | 49.600000 | 114.000000 | 27.000000 |
### edTest(test_pandas) ###
# Create a new dataframe by selecting the first 7 rows of
# the current dataframe
df_new = df.head(7)
# Print your new dataframe to see if you have selected 7 rows correctly
print(df_new)
TV Radio Newspaper Sales 0 230.1 37.8 69.2 22.1 1 44.5 39.3 45.1 10.4 2 17.2 45.9 69.3 9.3 3 151.5 41.3 58.5 18.5 4 180.8 10.8 58.4 12.9 5 8.7 48.9 75.0 7.2 6 57.5 32.8 23.5 11.8
Plotting the graph¶
# Use a scatter plot for plotting a graph of TV vs Sales
plt.scatter(df_new.TV, df_new.Sales)
# Add axis labels for clarity (x : TV budget, y : Sales)
plt.xlabel("TV budget")
plt.ylabel("Sales")
Text(0, 0.5, 'Sales')
Post-Exercise Question¶
Instead of just plotting seven points, experiment to plot all points.
# Your code here
plt.scatter(df.TV, df.Sales)
# Add axis labels for clarity (x : TV budget, y : Sales)
plt.xlabel("TV budget")
plt.ylabel("Sales")
Text(0, 0.5, 'Sales')