{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: Bagging Classification with Decision Boundary**\n",
"\n",
"# Description\n",
"\n",
"The goal of this exercise is to use **Bagging** (Bootstrap Aggregated) to solve a classification problem and visualize the influence on Bagging on trees with varying depths.\n",
"\n",
"Your final plot should resemble the one below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions:\n",
"- Read the dataset `agriland.csv`.\n",
"- Assign the predictor and response variables as `X` and `y`.\n",
"- Split the data into train and test sets with `test_split=0.2` and `random_state=44`.\n",
"- Fit a single `DecisionTreeClassifier()` and find the accuracy of your prediction.\n",
"- Complete the helper function `prediction_by_bagging()` to find the average predictions for a given number of bootstraps.\n",
"- Now perform **Bagging** using the helper function, and compute the new accuracy.\n",
"- Proceed to plot of accuracy with increasing number of bootstraps.\n",
"- Finally, use the helper code to plot the decision boundaries for varying `max_depth` along with `num_bootstraps`. Investigate the effect of increasing bootstraps on the variance.\n",
"\n",
"# Hints:\n",
"\n",
"sklearn.tree.DecisionTreeClassifier() : A decision tree classifier.\n",
"\n",
"np.random.choice : Generates a random sample from a given 1-D array\n",
"\n",
"plt.subplots() : Create a figure and a set of subplots.\n",
"\n",
"ax.plot() : Plot y versus x as lines and/or markers\n",
"\n",
"**Note: This exercise is auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bagging Classification"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Import required libraries\n",
"\n",
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import metrics\n",
"import scipy.optimize as opt\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"# to be used for plotting later\n",
"\n",
"from matplotlib.colors import ListedColormap\n",
"cmap_light = ListedColormap(['#FFF4E5','#D2E3EF'])\n",
"cmap_bold = ListedColormap(['#F7345E','#80C3BD'])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | latitude | \n", "longitude | \n", "land_type | \n", "
---|---|---|---|
0 | \n", "-0.071860 | \n", "-1.297410 | \n", "1.0 | \n", "
1 | \n", "-0.179482 | \n", "-0.874892 | \n", "1.0 | \n", "
2 | \n", "-1.217428 | \n", "-1.352105 | \n", "0.0 | \n", "
3 | \n", "1.143306 | \n", "-0.894172 | \n", "1.0 | \n", "
4 | \n", "-3.033199 | \n", "0.818646 | \n", "0.0 | \n", "