{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Title :\n", "Exercise: Confidence Intervals for Beta value\n", "\n", "## Description :\n", "The goal of this exercise is to create a plot like the one given below for $\\beta_0$ and $\\beta_1$. \n", "\n", "\n", "\n", "## Data Description:\n", "\n", "## Instructions:\n", "\n", "- Follow the steps from the previous exercise to get the lists of beta values.\n", "- Sort the list of beta values in ascending order (from low to high).\n", "- To compute the 95% confidence interval, find the 2.5 percentile and the 97.5 percentile using `np.percentile()`. \n", "- Use the helper code `plot_simulation()` to visualise the $\\beta$ values along with its confidence interval\n", "\n", "## Hints: \n", "\n", "$${\\widehat {\\beta_1 }}={\\frac {\\sum _{i=1}^{n}(x_{i}-{\\bar {x}})(y_{i}-{\\bar {y}})}{\\sum _{i=1}^{n}(x_{i}-{\\bar {x}})^{2}}}$$\n", "\n", "$${\\widehat {\\beta_0 }}={\\bar {y}}-{\\widehat {\\beta_1 }}\\,{\\bar {x}}$$\n", "\n", "np.random.randint()\n", "Returns list of integers as per mentioned size \n", "\n", "df.iloc[]\n", "Purely integer-location based indexing for selection by position\n", "\n", "plt.hist()\n", "Plots a histogram\n", "\n", "plt.axvline()\n", "Adds a vertical line across the axes\n", "\n", "plt.axhline()\n", "Add a horizontal line across the axes\n", "\n", "plt.xlabel()\n", "Sets the label for the x-axis\n", "\n", "plt.ylabel()\n", "Sets the label for the y-axis\n", "\n", "plt.legend()\n", "Place a legend on the axes\n", "\n", "ndarray.sort()\n", "Returns the sorted ndarray.\n", "\n", "np.percentile(list, q)\n", "Returns the q-th percentile value based on the provided ascending list of values.\n", "\n", "**Note:** This exercise is **auto-graded and you can try multiple attempts**." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading the standard Advertising dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Read the 'Advertising_adj.csv' file\n", "df = pd.read_csv('Advertising_adj.csv')\n", "\n", "# Take a quick look at the data\n", "df.head(3)\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Use the bootstrap function defined in the previous exercise\n", "def bootstrap(df):\n", " selectionIndex = np.random.randint(len(df), size = len(df))\n", " new_df = df.iloc[selectionIndex]\n", " return new_df\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Initialize empty lists to store beta values from 100 bootstraps \n", "# of the original data\n", "beta0_list, beta1_list = [],[]\n", "\n", "# Set the number of bootstraps\n", "numberOfBootstraps = 100\n", "\n", "# Loop over the number of bootstraps\n", "for i in range(numberOfBootstraps):\n", " \n", " # Call the function bootstrap with the original dataframe\n", " df_new = bootstrap(df)\n", " \n", " # Compute the mean of the predictor i.e. the TV column\n", " xmean = df_new.tv.mean()\n", "\n", " # Compute the mean of the response i.e. the Sales column\n", " ymean = df_new.sales.mean()\n", " \n", " # Compute beta1 analytical using the equation in the hints\n", " beta1 = (((df_new.tv - xmean)*(df_new.sales - ymean)).sum())/(((df_new.tv - xmean)**2).sum())\n", "\n", " # Compute beta1 analytical using the equation in the hints\n", " beta0 = ymean - beta1*xmean\n", " \n", " # Append the beta values to their appropriate lists\n", " beta0_list.append(beta0)\n", " beta1_list.append(beta1)\n", " " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "### edTest(test_sort) ###\n", "\n", "# Sort the two lists of beta values from the lowest value to highest \n", "beta0_list.___;\n", "beta1_list.___;\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "### edTest(test_beta) ###\n", "\n", "# Find the 95% percent confidence for beta0 interval using the \n", "# percentile function\n", "beta0_CI = (np.___,np.___)\n", "\n", "# Find the 95% percent confidence for beta1 interval using the \n", "# percentile function\n", "beta1_CI = (np.___,np.___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Print the confidence interval of beta0 upto 3 decimal points\n", "print(f'The beta0 confidence interval is {___}')\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Print the confidence interval of beta1 upto 3 decimal points\n", "print(f'The beta1 confidence interval is {___}')\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Helper function to plot the histogram of beta values along with \n", "# the 95% confidence interval\n", "def plot_simulation(simulation,confidence):\n", " plt.hist(simulation, bins = 30, label = 'beta distribution', align = 'left', density = True)\n", " plt.axvline(confidence[1], 0, 1, color = 'r', label = 'Right Interval')\n", " plt.axvline(confidence[0], 0, 1, color = 'red', label = 'Left Interval')\n", " plt.xlabel('Beta value')\n", " plt.ylabel('Frequency')\n", " plt.title('Confidence Interval')\n", " plt.legend(frameon = False, loc = 'upper right')\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Call the function plot_simulation to get the histogram for beta 0\n", "# with the confidence interval\n", "plot_simulation(___,___)\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "# Call the function plot_simulation to get the histogram for beta 1\n", "# with the confidence interval\n", "plot_simulation(___,___)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }