Key Word(s): Decision Trees, Regression Trees, Stopping Conditions, Pruning, Bagging, Overfitting
Title :¶
Classification using Decision Tree
Description :¶
The goal of this exercise is to get comfortable using Decision Trees for classification in sklearn. Eventually, you will produce a plot similar to the one given below:
Instructions:¶
- Read the train and test datafile as Pandas data frame.
- Use
minority
andbachelor
as the predictor variables andwon
as the response. - Fit a decision tree of depth 2 and another of depth 10 on the training data.
- Call the function
plot_boundary
to visualise the decision boundary of these 2 classifiers. - Increase the number of predictor variables as mentioned in scaffold.
- Initialize a decision tree classifier of depth 2, 10 and 15.
- Fit the model on the train data.
- Compute the train and test accuracy scores for each classifier.
- Use the helper code to look at the feature importance of the predictors from the decision tree of depth 15.
Hints:¶
sklearn.DecisionTreeClassifier() Generates a Logistic Regression classifier
sklearn.score() Accuracy classification score.
classifier.fit() Build a decision tree classifier from the training set (X, y).
Note: This exercise is auto-graded and you can try multiple attempts.
In [0]:
# Import necessary libraries
import numpy as np
import pandas as pd
import sklearn as sk
import seaborn as sns
from sklearn import tree
import matplotlib.pyplot as plt
from helper import plot_boundary
from prettytable import PrettyTable
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
pd.set_option('display.width', 100)
pd.set_option('display.max_columns', 20)
plt.rcParams["figure.figsize"] = (12,8)
In [0]:
# Read the data file "election_train.csv" as a Pandas dataframe
elect_train = pd.read_csv("election_train.csv")
# Read the data file "election_test.csv" as a Pandas dataframe
elect_test = pd.read_csv("election_test.csv")
# Take a quick look at the train data
elect_train.head()
In [0]:
# Set the columns minority and bachelor as train data predictors
X_train = ___
# Set the columns minority and bachelor as test data predictors
X_test = ___
# Set the column "won" as the train response variable
y_train = ___
# Set the column "won" as the test response variable
y_test = ___
In [0]:
# Initialize a Decision Tree classifier with a depth of 2
dt1 = ___
# Fit the classifier on the train data
___
# Initialize a Decision Tree classifier with a depth of 10
dt2 = ___
# Fit the classifier on the train data
___
In [0]:
# Call the function plot_boundary from the helper file to get
# the decision boundaries of both the classifiers
plot_boundary(elect_train, dt1, dt2)
In [0]:
# Set of predictor columns
pred_cols = ['minority', 'density','hispanic','obesity','female','income','bachelor','inactivity']
# Use the columns above as the predictor data from the train data
X_train = elect_train[pred_cols]
# Use the columns above as the predictor data from the test data
X_test = elect_test[pred_cols]
# Initialize a Decision Tree classifier with a depth of 2
dt1 = ___
# Initialize a Decision Tree classifier with a depth of 10
dt2 = ___
# Initialize a Decision Tree classifier with a depth of 15
dt3 = ___
# Fit all the classifier on the train data
___
___
___
In [0]:
### edTest(test_accuracy) ###
# Compute the train and test accuracy for the first decision tree classifier of depth 2
dt1_train_acc = ___
dt1_test_acc = ___
# Compute the train and test accuracy for the second decision tree classifier of depth 10
dt2_train_acc = ___
dt2_test_acc = ___
# Compute the train and test accuracy for the third decision tree classifier of depth 15
dt3_train_acc = ___
dt3_test_acc = ___
In [0]:
# Helper code to plot the scores of each classifier as a table
pt = PrettyTable()
pt.field_names = ['Max Depth', 'Number of Features', 'Train Accuracy', 'Test Accuracy']
pt.add_row([2, 2, round(dt1_train_acc, 4), round(dt1_test_acc,4)])
pt.add_row([10, 2, round(dt2_train_acc,4), round(dt2_test_acc,4)])
pt.add_row([15, len(pred_cols), round(dt3_train_acc,4), round(dt3_test_acc,4)])
print(pt)
In [0]: