{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: Simple kNN Regression\n",
"\n",
"## Description :\n",
"The goal of this exercise is to **re-create the plots** given below. You would have come across these graphs in the lecture as well.\n",
"\n",
"\n",
"
\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"Part 1: KNN by hand for k=1\n",
"- Read the Advertisement data.\n",
"- Get a subset of the data from row 5 to row 13.\n",
"- Apply the kNN algorithm by hand and plot the first graph as given above.\n",
"\n",
"Part 2: Using sklearn package\n",
"- Read the Advertisement dataset.\n",
"- Split the data into train and test sets using the `train_test_split()` function.\n",
"- Set `k_list` as the possible k values ranging from 1 to 70.\n",
"- For each value of `k` in `k_list`:\n",
" - Use `sklearn KNearestNeighbors()` to fit train data.\n",
" - Predict on the test data.\n",
" - Use the helper code to get the second plot above for k=1,10,70.\n",
"\n",
"\n",
"\n",
"## Hints: \n",
"\n",
"np.argsort()\n",
"Returns the indices that would sort an array. \n",
"\n",
"df.iloc[]\n",
"Returns a subset of the dataframe that is contained in the column range passed as the argument.\n",
"\n",
"plt.plot()\n",
"Plot y versus x as lines and/or markers.\n",
"\n",
"df.values\n",
"Returns a Numpy representation of the DataFrame.\n",
"\n",
"pd.idxmin()\n",
"Returns index of the first occurrence of minimum over requested axis.\n",
"\n",
"np.min()\n",
"Returns the minimum along a given axis.\n",
"\n",
"np.max()\n",
"Returns the maximum along a given axis.\n",
"\n",
"model.fit()\n",
"Fit the k-nearest neighbors regressor from the training dataset.\n",
"\n",
"model.predict()\n",
"Predict the target for the provided data.\n",
"\n",
"np.zeros()\n",
"Returns a new array of given shape and type, filled with zeros.\n",
"\n",
"train_test_split(X,y)\n",
"Split arrays or matrices into random train and test subsets. \n",
"\n",
"np.linspace()\n",
"Returns evenly spaced numbers over a specified interval.\n",
"\n",
"KNeighborsRegressor(n_neighbors=k_value)\n",
"Regression-based on k-nearest neighbors. \n",
"\n",
"**Note:** This exercise is auto-graded, hence please remember to set all the parameters to the values mentioned in the scaffold before marking."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Import required libraries\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.neighbors import KNeighborsRegressor\n",
"from sklearn.model_selection import train_test_split\n",
"%matplotlib inline\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Read the data from the file \"Advertising.csv\"\n",
"filename = 'Advertising.csv'\n",
"df_adv = pd.read_csv(filename)\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | TV | \n", "Radio | \n", "Newspaper | \n", "Sales | \n", "
---|---|---|---|---|
0 | \n", "230.1 | \n", "37.8 | \n", "69.2 | \n", "22.1 | \n", "
1 | \n", "44.5 | \n", "39.3 | \n", "45.1 | \n", "10.4 | \n", "
2 | \n", "17.2 | \n", "45.9 | \n", "69.3 | \n", "9.3 | \n", "
3 | \n", "151.5 | \n", "41.3 | \n", "58.5 | \n", "18.5 | \n", "
4 | \n", "180.8 | \n", "10.8 | \n", "58.4 | \n", "12.9 | \n", "