{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Title :\n",
    "Exercise: Confidence Intervals for Beta value\n",
    "\n",
    "## Description :\n",
    "The goal of this exercise is to create a plot like the one given below for $\\beta_0$ and $\\beta_1$. \n",
    "\n",
    "<img src=\"../fig/fig2.png\" style=\"width: 500px;\">\n",
    "\n",
    "## Data Description:\n",
    "\n",
    "## Instructions:\n",
    "\n",
    "- Follow the steps from the previous exercise to get the lists of beta values.\n",
    "- Sort the list of beta values in ascending order (from low to high).\n",
    "- To compute the 95% confidence interval, find the 2.5 percentile and the 97.5 percentile using `np.percentile()`. \n",
    "- Use the helper code `plot_simulation()` to visualise the $\\beta$ values along with its confidence interval\n",
    "\n",
    "## Hints: \n",
    "\n",
    "$${\\widehat {\\beta_1 }}={\\frac {\\sum _{i=1}^{n}(x_{i}-{\\bar {x}})(y_{i}-{\\bar {y}})}{\\sum _{i=1}^{n}(x_{i}-{\\bar {x}})^{2}}}$$\n",
    "\n",
    "$${\\widehat {\\beta_0 }}={\\bar {y}}-{\\widehat {\\beta_1 }}\\,{\\bar {x}}$$\n",
    "\n",
    "<a href=\"https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.randint.html\" target=\"_blank\">np.random.randint()</a>\n",
    "Returns list of integers as per mentioned size \n",
    "\n",
    "<a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html\" target=\"_blank\">df.iloc[]</a>\n",
    "Purely integer-location based indexing for selection by position\n",
    "\n",
    "<a href=\"https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.hist.html\" target=\"_blank\">plt.hist()</a>\n",
    "Plots a histogram\n",
    "\n",
    "<a href=\"https://matplotlib.org/api/_as_gen/matplotlib.pyplot.axvline.html\" target=\"_blank\">plt.axvline()</a>\n",
    "Adds a vertical line across the axes\n",
    "\n",
    "<a href=\"https://matplotlib.org/api/_as_gen/matplotlib.pyplot.axhline.html\" target=\"_blank\">plt.axhline()</a>\n",
    "Add a horizontal line across the axes\n",
    "\n",
    "<a href=\"https://matplotlib.org/api/_as_gen/matplotlib.pyplot.xlabel.html\" target=\"_blank\">plt.xlabel()</a>\n",
    "Sets the label for the x-axis\n",
    "\n",
    "<a href=\"https://matplotlib.org/api/_as_gen/matplotlib.pyplot.ylabel.html\" target=\"_blank\">plt.ylabel()</a>\n",
    "Sets  the label for the y-axis\n",
    "\n",
    "<a href=\"https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html\" target=\"_blank\">plt.legend()</a>\n",
    "Place a legend on the axes\n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.ndarray.sort.html#numpy.ndarray.sort\" target=\"_blank\">ndarray.sort()</a>\n",
    "Returns the sorted ndarray.\n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.percentile.html\" target=\"_blank\">np.percentile(list, q)</a>\n",
    "Returns the q-th percentile value based on the provided ascending list of values.\n",
    "\n",
    "**Note:** This exercise is **auto-graded and you can try multiple attempts**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading the standard Advertising dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the 'Advertising_adj.csv' file\n",
    "df = pd.read_csv('Advertising_adj.csv')\n",
    "\n",
    "# Take a quick look at the data\n",
    "df.head(3)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use the bootstrap function defined in the previous exercise\n",
    "def bootstrap(df):\n",
    "    selectionIndex = np.random.randint(len(df), size = len(df))\n",
    "    new_df = df.iloc[selectionIndex]\n",
    "    return new_df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize empty lists to store beta values from 100 bootstraps \n",
    "# of the original data\n",
    "beta0_list, beta1_list = [],[]\n",
    "\n",
    "# Set the number of bootstraps\n",
    "numberOfBootstraps = 100\n",
    "\n",
    "# Loop over the number of bootstraps\n",
    "for i in range(numberOfBootstraps):\n",
    "    \n",
    "    # Call the function bootstrap with the original dataframe\n",
    "    df_new = bootstrap(df)\n",
    "    \n",
    "    # Compute the mean of the predictor i.e. the TV column\n",
    "    xmean = df_new.tv.mean()\n",
    "\n",
    "    # Compute the mean of the response i.e. the Sales column\n",
    "    ymean = df_new.sales.mean()\n",
    "    \n",
    "    # Compute beta1 analytical using the equation in the hints\n",
    "    beta1 = (((df_new.tv - xmean)*(df_new.sales - ymean)).sum())/(((df_new.tv - xmean)**2).sum())\n",
    "\n",
    "    # Compute beta1 analytical using the equation in the hints\n",
    "    beta0 = ymean - beta1*xmean\n",
    "    \n",
    "    # Append the beta values to their appropriate lists\n",
    "    beta0_list.append(beta0)\n",
    "    beta1_list.append(beta1)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_sort) ###\n",
    "\n",
    "# Sort the two lists of beta values from the lowest value to highest \n",
    "beta0_list.___;\n",
    "beta1_list.___;\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_beta) ###\n",
    "\n",
    "# Find the 95% percent confidence for beta0 interval using the \n",
    "# percentile function\n",
    "beta0_CI = (np.___,np.___)\n",
    "\n",
    "# Find the 95% percent confidence for beta1 interval using the \n",
    "# percentile function\n",
    "beta1_CI = (np.___,np.___)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the confidence interval of beta0 upto 3 decimal points\n",
    "print(f'The beta0 confidence interval is {___}')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the confidence interval of beta1 upto 3 decimal points\n",
    "print(f'The beta1 confidence interval is {___}')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function to plot the histogram of beta values along with \n",
    "# the 95% confidence interval\n",
    "def plot_simulation(simulation,confidence):\n",
    "    plt.hist(simulation, bins = 30, label = 'beta distribution', align = 'left', density = True)\n",
    "    plt.axvline(confidence[1], 0, 1, color = 'r', label = 'Right Interval')\n",
    "    plt.axvline(confidence[0], 0, 1, color = 'red', label = 'Left Interval')\n",
    "    plt.xlabel('Beta value')\n",
    "    plt.ylabel('Frequency')\n",
    "    plt.title('Confidence Interval')\n",
    "    plt.legend(frameon = False, loc = 'upper right')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Call the function plot_simulation to get the histogram for beta 0\n",
    "# with the confidence interval\n",
    "plot_simulation(___,___)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Call the function plot_simulation to get the histogram for beta 1\n",
    "# with the confidence interval\n",
    "plot_simulation(___,___)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}