{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise 1 - Basic Multi-classification**\n",
"\n",
"# Description\n",
"The goal of the exercise is to get comfortable using multiclass classification models. Eventually, you will produce a plot similar to the one given below:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Instructions: \n",
"We are trying to predict the types of Irises in the classic Iris data set based on measured characteristics\n",
"- Load the Iris data set and convert to a data frame.\n",
"- Fit multinomial & OvR logistic regressions and a $k$-NN model. \n",
"- Compute the accuracy of the models.\n",
"- Plot the classification boundaries against the two predictors used.\n",
"\n",
"# Hints:\n",
"\n",
"sklearn.LogisticRegression() : Generates a Logistic Regression classifier\n",
"\n",
"sklearn.fit() : Fits the model to the given data\n",
"\n",
"sklearn.predict() : Predict using the estimated model (Logistic or knn classifiers) to perform pure classification predictions\n",
"\n",
"sklearn.predict_proba() : Predict using the estimated model (Logistic or knn classifiers) to perform probability predictions of all the classes in the response (they should add up to 1 for each observation)\n",
"\n",
"sklearn.LogisticRegression.coef_ and .intercept_ : Pull off the estimated $\\beta$ coefficients in a Logistic Regression model\n",
"\n",
"sklearn.score() : Accuracy classification score.\n",
"\n",
"matplotlib.pcolormesh() : Accuracy classification score\n",
"\n",
"**Note: This exercise is auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from sklearn import datasets\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.neighbors import KNeighborsClassifier \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Irises\n",
"\n",
"Read in the data set and convert to a Pandas data frame:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"raw = datasets.load_iris()\n",
"iris = pd.DataFrame(raw['data'],columns=raw['feature_names'])\n",
"iris['type'] = raw['target'] \n",
"iris.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: this violin plot is 'inverted': putting the response variable in the model on the x-axis. This is fine for exploration"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"sns.violinplot(y=iris['sepal length (cm)'], x=iris['type'], split=True);"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Create a violin plot to compare petal length \n",
"# across the types of irises\n",
"\n",
"sns.violinplot(___);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we fit our first model (the OvR logistic) and print out the coefficients:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"logit_ovr = LogisticRegression(penalty='none', multi_class='ovr',max_iter = 1000).fit(\n",
" iris[['sepal length (cm)','sepal width (cm)']], iris['type'])\n",
"print(logit_ovr.intercept_)\n",
"print(logit_ovr.coef_)\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# we can predict classes or probabilities\n",
"print(logit_ovr.predict(iris[['sepal length (cm)','sepal width (cm)']])[0:5])\n",
"print(logit_ovr.predict_proba(iris[['sepal length (cm)','sepal width (cm)']])[0:5])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# and calculate accuracy\n",
"print(logit_ovr.score(iris[['sepal length (cm)','sepal width (cm)']],iris['type']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now it's your turn: but this time with the multinomial logistic regression."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_multinomial) ###\n",
"\n",
"# Fit the model and print out the coefficients\n",
"logit_multi = LogisticRegression(___).fit(___)\n",
"intercept = logit_multi.intercept_\n",
"coefs = logit_multi.coef_\n",
"print(intercept)\n",
"print(coefs)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_multinomialaccuracy) ###\n",
"\n",
"multi_accuracy = ___\n",
"print(multi_accuracy)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"# Plot the decision boundary. \n",
"x1_range = iris['sepal length (cm)'].max() - iris['sepal length (cm)'].min()\n",
"x2_range = iris['sepal width (cm)'].max() - iris['sepal width (cm)'].min()\n",
"x1_min, x1_max = iris['sepal length (cm)'].min()-0.1*x1_range, iris['sepal length (cm)'].max() +0.1*x1_range\n",
"x2_min, x2_max = iris['sepal width (cm)'].min()-0.1*x2_range, iris['sepal width (cm)'].max() + 0.1*x2_range\n",
"\n",
"step = .05 \n",
"x1x, x2x = np.meshgrid(np.arange(x1_min, x1_max, step), np.arange(x2_min, x2_max, step))\n",
"y_hat_ovr = logit_ovr.predict(np.c_[x1x.ravel(), x2x.ravel()])\n",
"y_hat_multi = ___\n",
"\n",
"\n",
"fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n",
"\n",
"ax1.pcolormesh(x1x, x2x, y_hat_ovr.reshape(x1x.shape), cmap=plt.cm.Paired,alpha = 0.5)\n",
"ax1.scatter(iris['sepal length (cm)'], iris['sepal width (cm)'], c=iris['type'], edgecolors='k', cmap=plt.cm.Paired)\n",
"\n",
"### your job is to create the same plot, but for the multinomial\n",
"#####\n",
"# your code here\n",
"#####\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"#fit a knn model (k=5) for the same data \n",
"knn5 = KNeighborsClassifier(___).fit(___)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_knnaccuracy) ###\n",
"\n",
"#Calculate the accuracy\n",
"knn5_accuracy = ___\n",
"print(knn5_accuracy)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"# and plot the classification boundary\n",
"\n",
"y_hat_knn5 = knn5.predict(np.c_[x1x.ravel(), x2x.ravel()])\n",
"\n",
"fig, ax1 = plt.subplots(1, 1, figsize=(8, 6))\n",
"\n",
"ax1.pcolormesh(x1x, x2x, y_hat_knn5.reshape(x1x.shape), cmap=plt.cm.Paired,alpha = 0.5)\n",
"# Plot also the training points\n",
"ax1.scatter(iris['sepal length (cm)'], iris['sepal width (cm)'], c=iris['type'], edgecolors='k', cmap=plt.cm.Paired)\n",
"\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}