{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: Confidence Intervals for Beta value\n",
"\n",
"## Description :\n",
"The goal of this exercise is to create a plot like the one given below for $\\beta_0$ and $\\beta_1$. \n",
"\n",
"
\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"- Follow the steps from the previous exercise to get the lists of beta values.\n",
"- Sort the list of beta values in ascending order (from low to high).\n",
"- To compute the 95% confidence interval, find the 2.5 percentile and the 97.5 percentile using `np.percentile()`. \n",
"- Use the helper code `plot_simulation()` to visualise the $\\beta$ values along with its confidence interval\n",
"\n",
"## Hints: \n",
"\n",
"$${\\widehat {\\beta_1 }}={\\frac {\\sum _{i=1}^{n}(x_{i}-{\\bar {x}})(y_{i}-{\\bar {y}})}{\\sum _{i=1}^{n}(x_{i}-{\\bar {x}})^{2}}}$$\n",
"\n",
"$${\\widehat {\\beta_0 }}={\\bar {y}}-{\\widehat {\\beta_1 }}\\,{\\bar {x}}$$\n",
"\n",
"np.random.randint()\n",
"Returns list of integers as per mentioned size \n",
"\n",
"df.iloc[]\n",
"Purely integer-location based indexing for selection by position\n",
"\n",
"plt.hist()\n",
"Plots a histogram\n",
"\n",
"plt.axvline()\n",
"Adds a vertical line across the axes\n",
"\n",
"plt.axhline()\n",
"Add a horizontal line across the axes\n",
"\n",
"plt.xlabel()\n",
"Sets the label for the x-axis\n",
"\n",
"plt.ylabel()\n",
"Sets the label for the y-axis\n",
"\n",
"plt.legend()\n",
"Place a legend on the axes\n",
"\n",
"ndarray.sort()\n",
"Returns the sorted ndarray.\n",
"\n",
"np.percentile(list, q)\n",
"Returns the q-th percentile value based on the provided ascending list of values.\n",
"\n",
"**Note:** This exercise is **auto-graded and you can try multiple attempts**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reading the standard Advertising dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Read the 'Advertising_adj.csv' file\n",
"df = pd.read_csv('Advertising_adj.csv')\n",
"\n",
"# Take a quick look at the data\n",
"df.head(3)\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Use the bootstrap function defined in the previous exercise\n",
"def bootstrap(df):\n",
" selectionIndex = np.random.randint(len(df), size = len(df))\n",
" new_df = df.iloc[selectionIndex]\n",
" return new_df\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Initialize empty lists to store beta values from 100 bootstraps \n",
"# of the original data\n",
"beta0_list, beta1_list = [],[]\n",
"\n",
"# Set the number of bootstraps\n",
"numberOfBootstraps = 100\n",
"\n",
"# Loop over the number of bootstraps\n",
"for i in range(numberOfBootstraps):\n",
" \n",
" # Call the function bootstrap with the original dataframe\n",
" df_new = bootstrap(df)\n",
" \n",
" # Compute the mean of the predictor i.e. the TV column\n",
" xmean = df_new.tv.mean()\n",
"\n",
" # Compute the mean of the response i.e. the Sales column\n",
" ymean = df_new.sales.mean()\n",
" \n",
" # Compute beta1 analytical using the equation in the hints\n",
" beta1 = (((df_new.tv - xmean)*(df_new.sales - ymean)).sum())/(((df_new.tv - xmean)**2).sum())\n",
"\n",
" # Compute beta1 analytical using the equation in the hints\n",
" beta0 = ymean - beta1*xmean\n",
" \n",
" # Append the beta values to their appropriate lists\n",
" beta0_list.append(beta0)\n",
" beta1_list.append(beta1)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_sort) ###\n",
"\n",
"# Sort the two lists of beta values from the lowest value to highest \n",
"beta0_list.___;\n",
"beta1_list.___;\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_beta) ###\n",
"\n",
"# Find the 95% percent confidence for beta0 interval using the \n",
"# percentile function\n",
"beta0_CI = (np.___,np.___)\n",
"\n",
"# Find the 95% percent confidence for beta1 interval using the \n",
"# percentile function\n",
"beta1_CI = (np.___,np.___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Print the confidence interval of beta0 upto 3 decimal points\n",
"print(f'The beta0 confidence interval is {___}')\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Print the confidence interval of beta1 upto 3 decimal points\n",
"print(f'The beta1 confidence interval is {___}')\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# Helper function to plot the histogram of beta values along with \n",
"# the 95% confidence interval\n",
"def plot_simulation(simulation,confidence):\n",
" plt.hist(simulation, bins = 30, label = 'beta distribution', align = 'left', density = True)\n",
" plt.axvline(confidence[1], 0, 1, color = 'r', label = 'Right Interval')\n",
" plt.axvline(confidence[0], 0, 1, color = 'red', label = 'Left Interval')\n",
" plt.xlabel('Beta value')\n",
" plt.ylabel('Frequency')\n",
" plt.title('Confidence Interval')\n",
" plt.legend(frameon = False, loc = 'upper right')\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Call the function plot_simulation to get the histogram for beta 0\n",
"# with the confidence interval\n",
"plot_simulation(___,___)\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Call the function plot_simulation to get the histogram for beta 1\n",
"# with the confidence interval\n",
"plot_simulation(___,___)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}