{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: A.2 - Simple kNN regression**\n",
"\n",
"# Description\n",
"\n",
"The goal of this exercise is to **re-create the plots** below from the lecture.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# Instructions:\n",
"Part 1 **KNN by hand for k=1**\n",
"\n",
"- Read the Advertisement data \n",
"- Get a subset of the data from row 5 to row 13\n",
"- Apply the kNN algorithm by hand and plot the first graph as given above.\n",
"\n",
"Part 2 **Using sklearn package**\n",
"- Read the entire Advertisement dataset\n",
"- Split the data into train and test sets using `train_test_split()` function\n",
"- Select `k_list` as possible k values ranging from 1 to 70.\n",
"- For each value of k in `k_list`:\n",
" - Use sklearn `KNearestNeighbors()` to fit train data\n",
" - Predict on the test data\n",
" - Use the helper code to get the second plot above for k=1,10,70\n",
"\n",
"# Hints:\n",
"np.argsort() : Returns the indices that would sort an array. \n",
"\n",
"df.iloc[] : Returns a subset of the dataframe that is contained in the column range passed as the argument\n",
"\n",
"df.values : Returns a Numpy representation of the DataFrame.\n",
"\n",
"pd.idxmin() : Returns index of the first occurrence of minimum over requested axis.\n",
"\n",
"np.min() : Returns the minimum along a given axis.\n",
"\n",
"np.max() : Returns the maximum along a given axis.\n",
"\n",
"np.zeros() : Returns a new array of given shape and type, filled with zeros.\n",
"\n",
"train_test_split(X,y) : Split arrays or matrices into random train and test subsets.\n",
"\n",
"np.linspace() : Returns evenly spaced numbers over a specified interval.\n",
"\n",
"KNeighborsRegressor(n_neighbors=k_value) : Regression-based on k-nearest neighbors.\n",
"\n",
"Note: This exercise is **auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | TV | \n", "Radio | \n", "Newspaper | \n", "Sales | \n", "
---|---|---|---|---|
0 | \n", "230.1 | \n", "37.8 | \n", "69.2 | \n", "22.1 | \n", "
1 | \n", "44.5 | \n", "39.3 | \n", "45.1 | \n", "10.4 | \n", "
2 | \n", "17.2 | \n", "45.9 | \n", "69.3 | \n", "9.3 | \n", "
3 | \n", "151.5 | \n", "41.3 | \n", "58.5 | \n", "18.5 | \n", "
4 | \n", "180.8 | \n", "10.8 | \n", "58.4 | \n", "12.9 | \n", "
5 | \n", "8.7 | \n", "48.9 | \n", "75.0 | \n", "7.2 | \n", "