{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: Computing the CI\n",
"\n",
"## Description :\n",
"You are the manager of the Advertising division of your company, and your boss asks you the question, **\"How much more sales will we have if we invest $1000 dollars in TV advertising?\"**\n",
"\n",
"
\n",
"\n",
"The goal of this exercise is to estimate the Sales with a 95% confidence interval using the Advertising.csv dataset.\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"- Read the file `Advertising.csv` as a dataframe.\n",
"- Fix a budget amount of 1000 dollars for TV advertising as variable called Budget.\n",
"- Select the number of bootstraps.\n",
"- For each bootstrap:\n",
" - Select a new dataframe with the predictor as TV and the response as Sales.\n",
" - Fit a simple linear regression on the data.\n",
" - Predict on the budget and compute the error estimate using the helper function `error_func()`.\n",
" - Store the sales as a sum of the prediction and the error estimate and append to `sales_list`.\n",
"- Sort the `sales_list` which is a distribution of predicted sales over numboot bootstraps.\n",
"- Compute the 95% confidence interval of `sales_list`.\n",
"- Use the helper function `plot_simulation` to visualize the distribution and print the estimated sales.\n",
"\n",
"## Hints: \n",
"\n",
"np.random.randint()\n",
"Returns list of integers as per mentioned size \n",
"\n",
"df.sample()\n",
"Get a new sample from a dataframe\n",
"\n",
"plt.hist()\n",
"Plots a histogram\n",
"\n",
"plt.axvline()\n",
"Adds a vertical line across the axes\n",
"\n",
"plt.axhline()\n",
"Add a horizontal line across the axes\n",
"\n",
"plt.legend()\n",
"Place a legend on the axes\n",
"\n",
"ndarray.sort()\n",
"Returns the sorted ndarray.\n",
"\n",
"np.percentile(list, q)\n",
"Returns the q-th percentile value based on the provided ascending list of values.\n",
"\n",
"Note: This exercise is **auto-graded and you can try multiple attempts**."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"%matplotlib inline\n",
"import numpy as np\n",
"import pandas as pd\n",
"from scipy import stats\n",
"import matplotlib.pyplot as plt\n",
"from sklearn import preprocessing\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import PolynomialFeatures\n"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"# Read the `Advertising.csv` dataframe\n",
"df = pd.read_csv('Advertising.csv')\n",
"\n",
"# Take a quick look at the data\n",
"df.head()\n"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [],
"source": [
"# Helper function to compute the variance of the error term \n",
"def error_func(y,y_p):\n",
" n = len(y)\n",
" return np.sqrt(np.sum((y-y_p)**2/(n-2)))\n"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [],
"source": [
"# Set the number of bootstraps \n",
"numboot = 1000\n",
"\n",
"# Set the budget as per the instructions given \n",
"# Use 2D list to facilitate model prediction (sklearn.LinearRegression requires input as a 2d array)\n",
"budget = [[___]]\n",
"\n",
"# Initialize an empty list to store sales predictions for each bootstrap\n",
"sales_list = []\n"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [],
"source": [
"# Loop through each bootstrap\n",
"for i in range(___):\n",
"\n",
" # Create bootstrapped version of the data using the sample function\n",
" # Set frac=1 and replace=True to get a bootstrap\n",
" df_new = df.sample(___, replace=___)\n",
"\n",
" # Get the predictor data ('TV') from the new bootstrapped data\n",
" x = df_new[[___]]\n",
"\n",
" # Get the response data ('Sales') from the new bootstrapped data\n",
" y = df_new.___\n",
"\n",
" # Initialize a Linear Regression model\n",
" linreg = LinearRegression()\n",
"\n",
" # Fit the model on the new data\n",
" linreg.fit(___,___)\n",
"\n",
" # Predict on the budget from the original data\n",
" prediction = linreg.predict(budget)\n",
"\n",
" # Predict on the bootstrapped data\n",
" y_pred = linreg.predict(x) \n",
"\n",
" # Compute the error using the helper function error_func\n",
" error = np.random.normal(0,error_func(y,y_pred))\n",
" \n",
" # The final sales prediction is the sum of the model prediction \n",
" # and the error term\n",
" sales = ___\n",
"\n",
" # Convert the sales to float type and append to the list\n",
" sales_list.append(np.float64(___))\n"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_sales) ###\n",
"# Sort the list containing sales predictions in ascending order \n",
"sales_list.sort()\n",
"\n",
"# Find the 95% confidence interval using np.percentile function \n",
"# at 2.5% and 97.5%\n",
"sales_CI = (np.percentile(___,___),np.percentile(___, ___))\n"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [],
"source": [
"# Helper function to plot the histogram of beta values along \n",
"# with the 95% confidence interval\n",
"def plot_simulation(simulation,confidence):\n",
" plt.hist(simulation, bins = 30, label = 'beta distribution', align = 'left', density = True,edgecolor='k')\n",
" plt.axvline(confidence[1], 0, 1, color = 'r', label = 'Right Interval')\n",
" plt.axvline(confidence[0], 0, 1, color = 'red', label = 'Left Interval')\n",
" plt.xlabel('Beta value')\n",
" plt.ylabel('Frequency')\n",
" plt.legend(frameon = False, loc = 'upper right')\n",
" plt.show();\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Call the plot_simulation function above with the computed sales \n",
"# distribution and the confidence intervals computed earlier\n",
"plot_simulation(sales_list,sales_CI)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Print the computed values\n",
"print(f\"With a TV advertising budget of ${budget[0][0]},\")\n",
"print(f\"we can expect an increase of sales anywhere between {sales_CI[0]:0.2f} to {sales_CI[1]:.2f}\\\n",
" with a 95% confidence interval\")\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"⏸ The sales predictions here is based on the Simple-Linear regression model between `TV` and `Sales`. Re-run the above exercise by fitting the model considering all variables in `Advertising.csv`. \n",
"\n",
"Keep the budget the same, i.e $1000 for 'TV' advertising. \n",
"You may have to change the `budget` variable to something like `[[1000,0,0]]` for proper computation.\n",
"\n",
"Does your predicted sales interval change?\n",
"Why, or why not?"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_chow1) ###\n",
"# Type your answer within in the quotes given\n",
"answer1 = '___'\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}