{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Title :\n",
    "Exercise: Simple kNN Regression\n",
    "\n",
    "## Description :\n",
    "The goal of this exercise is to **re-create the plots** given below. You would have come across these graphs in the lecture as well.\n",
    "\n",
    "<img src=\"../fig/fig2.png\" style=\"width: 500px;\">\n",
    "<img src=\"../fig/fig3.png\" style=\"width: 500px;\">\n",
    "\n",
    "## Data Description:\n",
    "\n",
    "## Instructions:\n",
    "\n",
    "<u>Part 1:</u> KNN by hand for k=1\n",
    "- Read the Advertisement data.\n",
    "- Get a subset of the data from row 5 to row 13.\n",
    "- Apply the kNN algorithm by hand and plot the first graph as given above.\n",
    "\n",
    "<u>Part 2:</u> Using sklearn package\n",
    "- Read the Advertisement dataset.\n",
    "- Split the data into train and test sets using the `train_test_split()` function.\n",
    "- Set `k_list` as the possible k values ranging from 1 to 70.\n",
    "- For each value of `k` in `k_list`:\n",
    "    - Use `sklearn KNearestNeighbors()` to fit train data.\n",
    "    - Predict on the test data.\n",
    "    - Use the helper code to get the second plot above for k=1,10,70.\n",
    "\n",
    "\n",
    "\n",
    "## Hints: \n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.argsort.html\" target=\"_blank\">np.argsort()</a>\n",
    "Returns the indices that would sort an array. \n",
    "\n",
    "<a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html\" target=\"_blank\">df.iloc[]</a>\n",
    "Returns a subset of the dataframe that is contained in the column range passed as the argument.\n",
    "\n",
    "<a href=\"https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html\" target=\"_blank\">plt.plot()</a>\n",
    "Plot y versus x as lines and/or markers.\n",
    "\n",
    "<a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html\" target=\"_blank\">df.values</a>\n",
    "Returns a Numpy representation of the DataFrame.\n",
    "\n",
    "<a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html\" target=\"_blank\">pd.idxmin()</a>\n",
    "Returns index of the first occurrence of minimum over requested axis.\n",
    "\n",
    "<a href=\"http://pageperso.lif.univ-mrs.fr/~francois.denis/IAAM1/numpy-html-1.14.0/reference/generated/numpy.ndarray.min.html\" target=\"_blank\">np.min()</a>\n",
    "Returns the minimum along a given axis.\n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html\" target=\"_blank\">np.max()</a>\n",
    "Returns the maximum along a given axis.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.fit\" target=\"_blank\">model.fit()</a>\n",
    "Fit the k-nearest neighbors regressor from the training dataset.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.predict\" target=\"_blank\">model.predict()</a>\n",
    "Predict the target for the provided data.\n",
    "\n",
    "<a href=\"https://numpy.org/devdocs/reference/generated/numpy.zeros.html\" target=\"_blank\">np.zeros()</a>\n",
    "Returns a new array of given shape and type, filled with zeros.\n",
    "\n",
    "<a href=\"http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\" target=\"_blank\">train_test_split(X,y)</a>\n",
    "Split arrays or matrices into random train and test subsets. \n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.linspace.html\" target=\"_blank\">np.linspace()</a>\n",
    "Returns evenly spaced numbers over a specified interval.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html\" target=\"_blank\">KNeighborsRegressor(n_neighbors=k_value)</a>\n",
    "Regression-based on k-nearest neighbors. \n",
    "\n",
    "**Note:** This exercise is auto-graded, hence please remember to set all the parameters to the values mentioned in the scaffold before marking."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required libraries\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.neighbors import KNeighborsRegressor\n",
    "from sklearn.model_selection import train_test_split\n",
    "%matplotlib inline\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the data from the file \"Advertising.csv\"\n",
    "filename = 'Advertising.csv'\n",
    "df_adv = pd.read_csv(filename)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TV</th>\n",
       "      <th>Radio</th>\n",
       "      <th>Newspaper</th>\n",
       "      <th>Sales</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>230.1</td>\n",
       "      <td>37.8</td>\n",
       "      <td>69.2</td>\n",
       "      <td>22.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44.5</td>\n",
       "      <td>39.3</td>\n",
       "      <td>45.1</td>\n",
       "      <td>10.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>17.2</td>\n",
       "      <td>45.9</td>\n",
       "      <td>69.3</td>\n",
       "      <td>9.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>151.5</td>\n",
       "      <td>41.3</td>\n",
       "      <td>58.5</td>\n",
       "      <td>18.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>180.8</td>\n",
       "      <td>10.8</td>\n",
       "      <td>58.4</td>\n",
       "      <td>12.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      TV  Radio  Newspaper  Sales\n",
       "0  230.1   37.8       69.2   22.1\n",
       "1   44.5   39.3       45.1   10.4\n",
       "2   17.2   45.9       69.3    9.3\n",
       "3  151.5   41.3       58.5   18.5\n",
       "4  180.8   10.8       58.4   12.9"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Take a quick look of the dataset\n",
    "df_adv.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1: KNN by hand for $k=1$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get a subset of the data i.e. rows 5 to 13\n",
    "# Use the TV column as the predictor\n",
    "x_true = df_adv.TV.iloc[5:13]\n",
    "\n",
    "# Use the Sales column as the response\n",
    "y_true = df_adv.Sales.iloc[5:13]\n",
    "\n",
    "# Sort the data to get indices ordered from lowest to highest TV values\n",
    "idx = np.argsort(x_true).values \n",
    "\n",
    "# Get the predictor data in the order given by idx above\n",
    "x_true  = x_true.iloc[idx].values\n",
    "\n",
    "# Get the response data in the order given by idx above\n",
    "y_true  = y_true.iloc[idx].values\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_findnearest) ###\n",
    "# Define a function that finds the index of the nearest neighbor \n",
    "# and returns the value of the nearest neighbor.  \n",
    "# Note that this is just for k = 1 where the distance function is \n",
    "# simply the absolute value.\n",
    "\n",
    "def find_nearest(array,value):\n",
    "    \n",
    "    # Hint: To find idx, use .idxmin() function on the series\n",
    "    idx = pd.Series(np.abs(array-value)).idxmin() \n",
    "\n",
    "    # Return the nearest neighbor index and value\n",
    "    return idx, array[idx]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create some synthetic x-values (might not be in the actual dataset)\n",
    "x = np.linspace(np.min(x_true), np.max(x_true))\n",
    "\n",
    "# Initialize the y-values for the length of the synthetic x-values to zero\n",
    "y = np.zeros((len(x)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the KNN algorithm to predict the y-value for the given x value\n",
    "for i, xi in enumerate(x):\n",
    "\n",
    "    # Get the Sales values closest to the given x value\n",
    "    y[i] = y[find_nearest(x, xi )[0]]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAO2UlEQVR4nO3cf6zdd13H8efLlprwI9mWdaO0ZS2zUaqS0d00M4sEHYN2EopEYpcwFvxRZlYDCWZ08If+JVMjGOLcLLKwCVqnjNBoZYxKgiYOezvHRil1l8poaV0vv5ElNoW3f9xv5fZybu+5PWc79/h5PpLm3u/n+/ne89kn3/TZ8733LlWFJKldPzbqBUiSRssQSFLjDIEkNc4QSFLjDIEkNW75qBdwIS699NJat27dqJchSWPl4MGDX6uqlXPHxzIE69atY3JyctTLkKSxkuTJXuM+GpKkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxhkCSWqcIZCkxg0lBEm2JDmSZCrJrh7nk+T93fnHkmyac35Zkn9P8vfDWI8kqX8DhyDJMuBOYCuwEbgxycY507YCG7o/O4C75px/G3B40LVIkhZvGO8INgNTVXW0qk4De4Btc+ZsA+6rGQ8DFyVZBZBkDfBLwF8MYS2SpEUaRghWA8dmHR/vxvqd8yfAbcAPzvciSXYkmUwyOT09PdCCJUk/NIwQpMdY9TMnyWuBU1V1cKEXqardVTVRVRMrV668kHVKknoYRgiOA2tnHa8BTvQ551rgdUm+zMwjpV9M8uEhrEmS1KdhhOAAsCHJ+iQrgO3A3jlz9gJv7n566Brg21V1sqpur6o1VbWuu+6fqupNQ1iTJKlPywf9AlV1JslO4EFgGXBPVR1Kckt3/m5gH3ADMAU8Dbxl0NeVJA1HquY+zl/6JiYmanJyctTLkKSxkuRgVU3MHfc3iyWpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkho3lBAk2ZLkSJKpJLt6nE+S93fnH0uyqRtfm+TTSQ4nOZTkbcNYjySpfwOHIMky4E5gK7ARuDHJxjnTtgIbuj87gLu68TPAO6rqpcA1wK09rpUkPYOG8Y5gMzBVVUer6jSwB9g2Z8424L6a8TBwUZJVVXWyqh4BqKrvAoeB1UNYkySpT8MIwWrg2Kzj4/zoX+YLzkmyDng58NkhrEmS1KdhhCA9xmoxc5I8H/go8Paq+k7PF0l2JJlMMjk9PX3Bi5UknWsYITgOrJ11vAY40e+cJM9hJgIfqaoH5nuRqtpdVRNVNbFy5cohLFuSBMMJwQFgQ5L1SVYA24G9c+bsBd7c/fTQNcC3q+pkkgAfBA5X1XuHsBZJ0iItH/QLVNWZJDuBB4FlwD1VdSjJLd35u4F9wA3AFPA08Jbu8muBm4DHkzzajb2rqvYNui5JUn9SNfdx/tI3MTFRk5OTo16GJI2VJAeramLuuL9ZLEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNG0oIkmxJciTJVJJdPc4nyfu7848l2dTvtZKkZ9bAIUiyDLgT2ApsBG5MsnHOtK3Ahu7PDuCuRVwrSXoGLR/C19gMTFXVUYAke4BtwBdmzdkG3FdVBTyc5KIkq4B1fVw7VL/65/+64JzrXnoZO15x5f/N/5Wr1/DGibV843un+a0PH1zw+rnzf/PnX8KrNl7Ol6b/m3c98PiC18+df9uWn+TqKy7h4JPf4A8/cWTB6+fO//03/CxXrnw+n/rCU3zgn48ueP3c+Xe96Woued4K/nbyGH938PiC18+d/zdv/TkAdn/mS+w/fGrB62fPf+TJb3H3TVcD8Aef+CKPPPnN81578XNXnDP/W0+f5j1veBkAtz/wGEenv3fe61+y8nnnzL/ouSt455afAuCWvzzIN58+fd7rN11x8TnzN11x0Tn30kK897z3zs6f7947u8ZhGsajodXAsVnHx7uxfub0cy0ASXYkmUwyOT09PfCiJUkzMvOP9AG+QPJG4DVV9Rvd8U3A5qr67Vlz/gF4T1X9S3e8H7gNeMlC1/YyMTFRk5OTA61bklqT5GBVTcwdH8ajoePA2lnHa4ATfc5Z0ce1kqRn0DAeDR0ANiRZn2QFsB3YO2fOXuDN3U8PXQN8u6pO9nmtJOkZNPA7gqo6k2Qn8CCwDLinqg4luaU7fzewD7gBmAKeBt5yvmsHXZMkqX8Df49gFPwegSQt3nzfI/A3iyWpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkhpnCCSpcYZAkho3UAiSXJLkoSRPdB8vnmfeliRHkkwl2TVr/I+SfDHJY0k+luSiQdYjSVq8Qd8R7AL2V9UGYH93fI4ky4A7ga3ARuDGJBu70w8BP1NVLwP+A7h9wPVIkhZp0BBsA+7tPr8XeH2POZuBqao6WlWngT3ddVTVJ6vqTDfvYWDNgOuRJC3SoCG4vKpOAnQfL+sxZzVwbNbx8W5srl8D/nHA9UiSFmn5QhOSfAp4YY9T7+7zNdJjrOa8xruBM8BHzrOOHcAOgBe/+MV9vrQkaSELhqCqXjXfuSRPJVlVVSeTrAJO9Zh2HFg763gNcGLW17gZeC1wXVUV86iq3cBugImJiXnnSZIWZ9BHQ3uBm7vPbwY+3mPOAWBDkvVJVgDbu+tIsgV4J/C6qnp6wLVIki7AoCG4A7g+yRPA9d0xSV6UZB9A983gncCDwGHg/qo61F3/p8ALgIeSPJrk7gHXI0lapAUfDZ1PVX0duK7H+AnghlnH+4B9Peb9xCCvL0kanL9ZLEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNMwSS1DhDIEmNGygESS5J8lCSJ7qPF88zb0uSI0mmkuzqcf53klSSSwdZjyRp8QZ9R7AL2F9VG4D93fE5kiwD7gS2AhuBG5NsnHV+LXA98JUB1yJJugCDhmAbcG/3+b3A63vM2QxMVdXRqjoN7OmuO+t9wG1ADbgWSdIFGDQEl1fVSYDu42U95qwGjs06Pt6NkeR1wFer6nMLvVCSHUkmk0xOT08PuGxJ0lnLF5qQ5FPAC3ucenefr5EeY5Xkud3XeHU/X6SqdgO7ASYmJnz3IElDsmAIqupV851L8lSSVVV1Mskq4FSPaceBtbOO1wAngCuB9cDnkpwdfyTJ5qr6r0X8N0iSBjDoo6G9wM3d5zcDH+8x5wCwIcn6JCuA7cDeqnq8qi6rqnVVtY6ZYGwyApL07Bo0BHcA1yd5gpmf/LkDIMmLkuwDqKozwE7gQeAwcH9VHRrwdSVJQ7Lgo6HzqaqvA9f1GD8B3DDreB+wb4GvtW6QtUiSLoy/WSxJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktQ4QyBJjTMEktS4VNWo17BoSaaBJ4FLga+NeDnjwH3qj/vUH/epP0txn66oqpVzB8cyBGclmayqiVGvY6lzn/rjPvXHferPOO2Tj4YkqXGGQJIaN+4h2D3qBYwJ96k/7lN/3Kf+jM0+jfX3CCRJgxv3dwSSpAEZAklq3NiGIMmWJEeSTCXZNer1LCVJvpzk8SSPJpnsxi5J8lCSJ7qPF496nc+2JPckOZXk87PG5t2XJLd399eRJK8ZzaqfffPs0+8l+Wp3Tz2a5IZZ55rbpyRrk3w6yeEkh5K8rRsfy/tpLEOQZBlwJ7AV2AjcmGTjaFe15PxCVV016+eYdwH7q2oDsL87bs2HgC1zxnruS3c/bQd+urvmz7r7rgUf4kf3CeB93T11VVXtg6b36Qzwjqp6KXANcGu3F2N5P41lCIDNwFRVHa2q08AeYNuI17TUbQPu7T6/F3j96JYyGlX1GeAbc4bn25dtwJ6q+p+q+k9gipn77v+9efZpPk3uU1WdrKpHus+/CxwGVjOm99O4hmA1cGzW8fFuTDMK+GSSg0l2dGOXV9VJmLmJgctGtrqlZb598R77UTuTPNY9Ojr7yKP5fUqyDng58FnG9H4a1xCkx5g/B/tD11bVJmYend2a5BWjXtAY8h47113AlcBVwEngj7vxpvcpyfOBjwJvr6rvnG9qj7Els0/jGoLjwNpZx2uAEyNay5JTVSe6j6eAjzHzFvSpJKsAuo+nRrfCJWW+ffEem6Wqnqqq71fVD4AP8MPHGs3uU5LnMBOBj1TVA93wWN5P4xqCA8CGJOuTrGDmmzB7R7ymJSHJ85K84OznwKuBzzOzPzd3024GPj6aFS458+3LXmB7kh9Psh7YAPzbCNa3JJz9y63zy8zcU9DoPiUJ8EHgcFW9d9apsbyflo96AReiqs4k2Qk8CCwD7qmqQyNe1lJxOfCxmfuU5cBfVdUnkhwA7k/y68BXgDeOcI0jkeSvgVcClyY5DvwucAc99qWqDiW5H/gCMz8hcmtVfX8kC3+WzbNPr0xyFTOPM74MvBWa3qdrgZuAx5M82o29izG9n/xfTEhS48b10ZAkaUgMgSQ1zhBIUuMMgSQ1zhBIUuMMgSQ1zhBIUuP+FyO7y3VFPaQqAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "ename": "NameError",
     "evalue": "name 'Y_true' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m/tmp/ipykernel_30/3566749308.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;31m# Plot the original data using black x's.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_true\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mY_true\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'kx'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      7\u001b[0m \u001b[0;31m# Set the title and axis labels\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'Y_true' is not defined"
     ]
    }
   ],
   "source": [
    "# Plot the synthetic data along with the predictions    \n",
    "plt.plot(x, y, '-.')\n",
    "\n",
    "# Plot the original data using black x's.\n",
    "plt.plot(x_true, Y_true, 'kx')\n",
    "\n",
    "# Set the title and axis labels\n",
    "plt.title('TV vs Sales')\n",
    "plt.xlabel('TV budget in $1000')\n",
    "plt.ylabel('Sales in $1000')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: KNN for $k\\ge1$ using sklearn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the data from the file \"Advertising.csv\"\n",
    "data_filename = 'Advertising.csv'\n",
    "df = pd.read_csv(data_filename)\n",
    "\n",
    "# Set 'TV' as the 'predictor variable'   \n",
    "x = df[[___]]\n",
    "\n",
    "# Set 'Sales' as the response variable 'y' \n",
    "y = df[___]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_shape) ###\n",
    "\n",
    "# Split the dataset in training and testing with 60% training set \n",
    "# and 40% testing set with random state = 42\n",
    "x_train, x_test, y_train, y_test = train_test_split(___, ___, train_size=___,random_state=___)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_nums) ###\n",
    "\n",
    "# Choose the minimum k value based on the instructions given on the left\n",
    "k_value_min = ___\n",
    "\n",
    "# Choose the maximum k value based on the instructions given on the left\n",
    "k_value_max = ___\n",
    "\n",
    "\n",
    "# Create a list of integer k values betwwen k_value_min and k_value_max using linspace\n",
    "k_list = np.linspace(k_value_min, k_value_max, 70)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the grid to plot the values\n",
    "fig, ax = plt.subplots(figsize=(10,6))\n",
    "\n",
    "# Variable used to alter the linewidth of each plot\n",
    "j=0\n",
    "\n",
    "# Loop over all the k values\n",
    "for k_value in k_list:   \n",
    "    \n",
    "    # Creating a kNN Regression model \n",
    "    model = KNeighborsRegressor(n_neighbors=int(___))\n",
    "    \n",
    "    # Fitting the regression model on the training data \n",
    "    model.fit(___,___)\n",
    "    \n",
    "    # Use the trained model to predict on the test data \n",
    "    y_pred = model.predict(___)\n",
    "    \n",
    "    # Helper code to plot the data along with the model predictions\n",
    "    colors = ['grey','r','b']\n",
    "    if k_value in [1,10,70]:\n",
    "        xvals = np.linspace(x.min(),x.max(),100)\n",
    "        ypreds = model.predict(xvals)\n",
    "        ax.plot(xvals, ypreds,'-',label = f'k = {int(k_value)}',linewidth=j+2,color = colors[j])\n",
    "        j+=1\n",
    "        \n",
    "ax.legend(loc='lower right',fontsize=20)\n",
    "ax.plot(x_train, y_train,'x',label='train',color='k')\n",
    "ax.set_xlabel('TV budget in $1000',fontsize=20)\n",
    "ax.set_ylabel('Sales in $1000',fontsize=20)\n",
    "plt.tight_layout()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ⏸ In the plotting code above, re-run `ax.plot(x_train, y_train,'x',label='train',color='k')` with `x_test` and `y_test` instead. According to you, which k value is the best and why?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_chow1) ###\n",
    "# Type your answer within in the quotes given\n",
    "answer1 = '___'\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}