{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Title\n",
"\n",
"**Exercise - Multi-class Classification and PCA**\n",
"\n",
"# Description\n",
"\n",
"This exercise follows more like a lab and is broken into 3 parts (we will reconvene after each part). The 3 parts are:\n",
"1. Data set exploration and baseline models (graded)\n",
"2. PCA and PCR (graded)\n",
"3. Going Further with the dataset (not graded)\n",
"\n",
"# Learning Goals\n",
"\n",
"In this lab, we will look at how to use PCA to reduce a dataset to a smaller number of dimensions. The goals are:\n",
"- Better Understand the multiclass setting\n",
"- Understand what PCA is and why it's useful\n",
"- Feel comfortable performing PCA on a new dataset\n",
"- Understand what it means for each component to capture variance from the original dataset\n",
"- Be able to extract the `variance explained` by components\n",
"- Perform modeling with the PCA components\n",
"\n",
"# Hints:\n",
"\n",
"sklearn.accuracy_score : Generates a Logistic Regression Model\n",
"\n",
"sklearn.LogisticRegression : Generates a Logistic Regression Model\n",
"\n",
"sklearn.LogisticRegressionCV : Uses CV to perform regularization in Logistic Regression\n",
"\n",
"sklearn.PCA : Create a Principal Component Analysis (PCA) decomposition\n",
"\n",
"**Note: This exercise is auto-graded and you can try multiple attempts.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CS-109A Introduction to Data Science\n",
"\n",
"\n",
"## Lecture 21: More on Classification and PCA\n",
"\n",
"**Harvard University**
\n",
"**Fall 2020**
\n",
"**Instructors:** Pavlos Protopapas, Kevin Rader, Chris Tanner
\n",
"**Contributors:** Kevin Rader, Eleni Kaxiras, Chris Tanner, Will Claybaugh, David Sondak\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"## RUN THIS CELL TO PROPERLY HIGHLIGHT THE EXERCISES\n",
"import requests\n",
"from IPython.core.display import HTML\n",
"styles = requests.get(\"https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css\").text\n",
"HTML(styles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning Goals\n",
"In this lab, we will look at how to use PCA to reduce a dataset to a smaller number of dimensions. The goals are:\n",
"