{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise: Regression with Bagging**\n",
"\n",
"# Description\n",
"\n",
"The aim of this exercise is to understand bagging regression. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions:\n",
"- Read the dataset airquality.csv as a pandas dataframe.\n",
"- Take a quick look at the dataset.\n",
"- Split the data into train and test sets.\n",
"- Specify the number of bootstraps as 30 and a maximum depth of 3.\n",
"- Define a Bagging Regression model that uses Decision Tree as its base estimator.\n",
"- Fit the model on the train data.\n",
"- Use the helper code to predict using the mean model and individual estimators. The plot will look similar to the one given above.\n",
"- Predict on the test data using the first estimator and the mean model.\n",
"- Compute and display the test MSEs.\n",
"\n",
"# Hints:\n",
"\n",
"sklearn.train_test_split() : Split arrays or matrices into random train and test subsets.\n",
"\n",
"BaggingRegressor() : Returns a Bagging regressor instance.\n",
"\n",
"DecisionTreeRegressor() : A decision tree regressor.\n",
"\n",
"DecisionTreeRegressor().estimators_ : A list of estimators. Use this to access any of the estimators. \n",
"\n",
"sklearn.mean_squared_error() : Mean squared error regression loss."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"\n",
"import numpy as np\n",
"from numpy import mean\n",
"from numpy import std\n",
"from sklearn.datasets import make_regression\n",
"from sklearn.ensemble import BaggingRegressor\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd \n",
"import itertools\n",
"from sklearn.tree import DecisionTreeRegressor\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Read the dataset\n",
"df = pd.read_csv(\"airquality.csv\",index_col=0)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Take a quick look at the data\n",
"df.head(10)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# We will only use Ozone for this exerice. Drop any notnas\n",
"df = df[df.Ozone.notna()]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# Assign \"x\" column as the predictor variable, only use Ozone, and \"y\" as the\n",
"x = df[['Ozone']].values\n",
"y = df['Temp']"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# Split the data into train and test sets with train size as 0.8 and random_state as 102\n",
"x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, random_state=102)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bagging Regressor"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | Ozone | \n", "Solar.R | \n", "Wind | \n", "Temp | \n", "Month | \n", "Day | \n", "
---|---|---|---|---|---|---|
1 | \n", "41.0 | \n", "190.0 | \n", "7.4 | \n", "67 | \n", "5 | \n", "1 | \n", "
2 | \n", "36.0 | \n", "118.0 | \n", "8.0 | \n", "72 | \n", "5 | \n", "2 | \n", "
3 | \n", "12.0 | \n", "149.0 | \n", "12.6 | \n", "74 | \n", "5 | \n", "3 | \n", "
4 | \n", "18.0 | \n", "313.0 | \n", "11.5 | \n", "62 | \n", "5 | \n", "4 | \n", "
6 | \n", "28.0 | \n", "NaN | \n", "14.9 | \n", "66 | \n", "5 | \n", "6 | \n", "
7 | \n", "23.0 | \n", "299.0 | \n", "8.6 | \n", "65 | \n", "5 | \n", "7 | \n", "
8 | \n", "19.0 | \n", "99.0 | \n", "13.8 | \n", "59 | \n", "5 | \n", "8 | \n", "
9 | \n", "8.0 | \n", "19.0 | \n", "20.1 | \n", "61 | \n", "5 | \n", "9 | \n", "
11 | \n", "7.0 | \n", "NaN | \n", "6.9 | \n", "74 | \n", "5 | \n", "11 | \n", "
12 | \n", "16.0 | \n", "256.0 | \n", "9.7 | \n", "69 | \n", "5 | \n", "12 | \n", "
13 | \n", "11.0 | \n", "290.0 | \n", "9.2 | \n", "66 | \n", "5 | \n", "13 | \n", "
14 | \n", "14.0 | \n", "274.0 | \n", "10.9 | \n", "68 | \n", "5 | \n", "14 | \n", "
15 | \n", "18.0 | \n", "65.0 | \n", "13.2 | \n", "58 | \n", "5 | \n", "15 | \n", "
16 | \n", "14.0 | \n", "334.0 | \n", "11.5 | \n", "64 | \n", "5 | \n", "16 | \n", "
17 | \n", "34.0 | \n", "307.0 | \n", "12.0 | \n", "66 | \n", "5 | \n", "17 | \n", "
18 | \n", "6.0 | \n", "78.0 | \n", "18.4 | \n", "57 | \n", "5 | \n", "18 | \n", "
19 | \n", "30.0 | \n", "322.0 | \n", "11.5 | \n", "68 | \n", "5 | \n", "19 | \n", "
20 | \n", "11.0 | \n", "44.0 | \n", "9.7 | \n", "62 | \n", "5 | \n", "20 | \n", "
21 | \n", "1.0 | \n", "8.0 | \n", "9.7 | \n", "59 | \n", "5 | \n", "21 | \n", "
22 | \n", "11.0 | \n", "320.0 | \n", "16.6 | \n", "73 | \n", "5 | \n", "22 | \n", "
23 | \n", "4.0 | \n", "25.0 | \n", "9.7 | \n", "61 | \n", "5 | \n", "23 | \n", "
24 | \n", "32.0 | \n", "92.0 | \n", "12.0 | \n", "61 | \n", "5 | \n", "24 | \n", "
28 | \n", "23.0 | \n", "13.0 | \n", "12.0 | \n", "67 | \n", "5 | \n", "28 | \n", "