Title :¶
Exercise: Visualizing a Decision Tree
Description :¶
The aim of this exercise is to visualize the decision tree that is created when performing Decision Tree Classification or Regression. The tree will look similar to the one given below.
Data Description:¶
We are trying to predict the winner of the 2016 Presidential election (Trump vs. Clinton) in each county in the US. To do this, we will consider several predictors including minority: the percentage of residents that are minorities and bachelor: the percentage of resident adults with a bachelor's degree (or higher).
Instructions:¶
- Read the datafile
county_election_train.csv
into a Pandas data frame. - Create the response variable based on the columns
trump
andclinton
. - Initialize a Decision Tree classifier of depth 3 and fit on the training data.
- Visualise the Decision Tree.
Hints:¶
sklearn.DecisionTreeClassifier()Generates a Logistic Regression classifier.
classifier.fit()Build a decision tree classifier from the training set (X, y).
plt.scatter()A scatter plot of y vs. x with varying marker size and/or color.
plt.xlabel()Set the label for the x-axis.
plt.ylabel()Set the label for the y-axis.
plt.legend()Place a legend on the Axes.
tree.plot_tree()Plot a decision tree.
Note: This exercise is auto-graded and you can try multiple attempts.
# Import necessary libraries
import numpy as np
import pandas as pd
import sklearn as sk
import seaborn as sns
from sklearn import tree
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
pd.set_option('display.width', 100)
pd.set_option('display.max_columns', 20)
plt.rcParams["figure.figsize"] = (12,8)
# Read the datafile "county_election_train.csv" as a Pandas dataframe
elect_train = pd.read_csv("data/county_election_train.csv")
# Read the datafile "county_election_test.csv" as a Pandas dataframe
elect_test = pd.read_csv("data/county_election_test.csv")
# Take a quick look at the dataframe
elect_train.head()
### edTest(test_response) ###
# Creating the response variable
# Set all the rows in the train data where "trump" value is more than "clinton" as 1
y_train = ___
# Set all the rows in the test data where "trump" value is more than "clinton" as 1
y_test = ___
# Plot "minority" vs "bachelor" as a scatter plot
# Set colours blue for Trump and green for Clinton
# Your code here
# Initialize a Decision Tree classifier of depth 3 and choose
# splitting criteria to be the gini
dtree = ___
# Fit the classifier on the train data
# but only use the minority column as the predictor variable
___
# Code to set the size of the plot
plt.figure(figsize=(30,20))
# Plot the Decision Tree trained above with parameters filled as True
tree.plot_tree(___)
plt.show();