{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Title :\n",
    "Exercise: Finding the Best k in kNN Regression\n",
    "\n",
    "## Description :\n",
    "The goal here is to find the value of k of the best performing model based on the test MSE.\n",
    "\n",
    "<img src=\"../fig/fig4.jpeg\" style=\"width: 500px;\">\n",
    "\n",
    "## Data Description:\n",
    "\n",
    "## Instructions:\n",
    "- Read the data into a Pandas dataframe object. \n",
    "- Select the sales column as the response variable and TV budget column as the predictor variable.\n",
    "- Make a train-test split using sklearn.model_selection.train_test_split .\n",
    "- Create a list of integer k values using numpy.linspace .\n",
    "- For each value of k\n",
    "    - Fit a kNN regression on train set.\n",
    "    - Calculate MSE on test set and store it.\n",
    "- Plot the test MSE values for each k.\n",
    "- Find the k value associated with the lowest test MSE.\n",
    "\n",
    "\n",
    "## Hints: \n",
    "\n",
    "<a href=\"http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\" target=\"_blank\">train_test_split(X,y)</a>\n",
    "Split arrays or matrices into random train and test subsets. \n",
    "\n",
    "<a href=\"https://numpy.org/doc/stable/reference/generated/numpy.linspace.html\" target=\"_blank\">np.linspace()</a>\n",
    "Returns evenly spaced numbers over a specified interval.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html\" target=\"_blank\">KNeighborsRegressor(n_neighbors=k_value)</a>\n",
    "Regression-based on k-nearest neighbors. \n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.predict\" target=\"_blank\">model.predict()</a>\n",
    "Predict the target for the provided data.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error\" target=\"_blank\">mean_squared_error()</a>\n",
    "Computes the mean squared error regression loss.\n",
    "\n",
    "<a href=\"https://www.geeksforgeeks.org/python-dictionary-keys-method/?ref=lbp\" target=\"_blank\">dict.keys()</a>\n",
    "Returns a view object that displays a list of all the keys in the dictionary.\n",
    "\n",
    "<a href=\"https://www.programiz.com/python-programming/methods/dictionary/values\" target=\"_blank\">dict.values()</a>\n",
    "Returns a list of all the values available in a given dictionary.\n",
    "\n",
    "<a href=\"https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html\" target=\"_blank\">plt.plot()</a>\n",
    "Plot y versus x as lines and/or markers.\n",
    "\n",
    "<a href=\"https://www.tutorialspoint.com/python/dictionary_items.htm\" target=\"_blank\">dict.items()</a>\n",
    "Returns a list of dict's (key, value) tuple pairs.\n",
    "\n",
    "\n",
    "**Note:** This exercise is auto-graded and you can try multiple attempts. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import numpy as np\n",
    "import pandas as pd \n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.utils import shuffle\n",
    "from sklearn.metrics import r2_score\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.neighbors import KNeighborsRegressor\n",
    "from sklearn.model_selection import train_test_split\n",
    "%matplotlib inline\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading the standard Advertising dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the file 'Advertising.csv' into a Pandas dataset\n",
    "df = pd.read_csv('Advertising.csv')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Take a quick look at the data\n",
    "df.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the 'TV' column as predictor variable\n",
    "x = df[[___]]\n",
    "\n",
    "# Set the 'Sales' column as response variable \n",
    "y = df[___]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train-Test split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_shape) ###\n",
    "# Split the dataset in training and testing with 60% training set and \n",
    "# 40% testing set \n",
    "x_train, x_test, y_train, y_test = train_test_split(___,___,train_size=___,random_state=66)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_nums) ###\n",
    "# Choosing k range from 1 to 70\n",
    "k_value_min = 1\n",
    "k_value_max = 70\n",
    "\n",
    "# Create a list of integer k values between k_value_min and \n",
    "# k_value_max using linspace\n",
    "k_list = np.linspace(k_value_min,k_value_max,num=70,dtype=int)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup a grid for plotting the data and predictions\n",
    "fig, ax = plt.subplots(figsize=(10,6))\n",
    "\n",
    "# Create a dictionary to store the k value against MSE fit {k: MSE@k} \n",
    "knn_dict = {}\n",
    "\n",
    "# Variable used for altering the linewidth of values kNN models\n",
    "j=0\n",
    "\n",
    "# Loop over all k values\n",
    "for k_value in k_list:   \n",
    "    \n",
    "    # Create a KNN Regression model for the current k\n",
    "    model = KNeighborsRegressor(n_neighbors=int(___))\n",
    "    \n",
    "    # Fit the model on the train data\n",
    "    model.fit(x_train,y_train)\n",
    "    \n",
    "    # Use the trained model to predict on the test data\n",
    "    y_pred = model.predict(___)\n",
    "    \n",
    "    # Calculate the MSE of the test data predictions\n",
    "    MSE = ____\n",
    "\n",
    "    # Store the MSE values of each k value in the dictionary\n",
    "    knn_dict[k_value] = ___\n",
    "    \n",
    "    \n",
    "    # Helper code to plot the data and various kNN model predictions\n",
    "    colors = ['grey','r','b']\n",
    "    if k_value in [1,10,70]:\n",
    "        xvals = np.linspace(x.min(),x.max(),100)\n",
    "        ypreds = model.predict(xvals)\n",
    "        ax.plot(xvals, ypreds,'-',label = f'k = {int(k_value)}',linewidth=j+2,color = colors[j])\n",
    "        j+=1\n",
    "        \n",
    "ax.legend(loc='lower right',fontsize=20)\n",
    "ax.plot(x_train, y_train,'x',label='test',color='k')\n",
    "ax.set_xlabel('TV budget in $1000',fontsize=20)\n",
    "ax.set_ylabel('Sales in $1000',fontsize=20)\n",
    "plt.tight_layout()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Graph plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a graph which depicts the relation between the k values and MSE\n",
    "plt.figure(figsize=(8,6))\n",
    "plt.plot(___, ___,'k.-',alpha=0.5,linewidth=2)\n",
    "\n",
    "# Set the title and axis labels\n",
    "plt.xlabel('k',fontsize=20)\n",
    "plt.ylabel('MSE',fontsize = 20)\n",
    "plt.title('Test $MSE$ values for different k values - KNN regression',fontsize=20)\n",
    "plt.tight_layout()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find the best knn model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_mse) ###\n",
    "\n",
    "# Find the lowest MSE among all the kNN models\n",
    "min_mse = min(___)\n",
    "\n",
    "# Use list comprehensions to find the k value associated with the lowest MSE\n",
    "best_model = [key  for (key, value) in knn_dict.items() if value == min_mse]\n",
    "\n",
    "# Print the best k-value\n",
    "print (\"The best k value is \",best_model,\"with a MSE of \", min_mse)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ⏸ From the options below, how would you classify the \"goodness\" of your model?\n",
    "\n",
    "#### A. Good\n",
    "#### B. Satisfactory\n",
    "#### C. Bad"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_chow1) ###\n",
    "# Submit an answer choice as a string below (eg. if you choose option A, put 'A')\n",
    "answer1 = '___'\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper code to compute the R2_score of your best model\n",
    "model = KNeighborsRegressor(n_neighbors=best_model[0])\n",
    "model.fit(x_train,y_train)\n",
    "y_pred_test = model.predict(x_test)\n",
    "\n",
    "# Print the R2 score of the model\n",
    "print(f\"The R2 score for your model is {r2_score(y_test, y_pred_test)}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ⏸ After observing the $R^2$ value, how would you now classify your model?\n",
    "\n",
    "#### A. Good\n",
    "#### B. Satisfactory\n",
    "#### C. Bad\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_chow2) ###\n",
    "# Submit an answer choice as a string below (eg. if you choose option A, put 'A')\n",
    "answer2 = '___'\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}