Title

Exercise: Visualization

Description

For this exercise, we will continue to work with the Boston housing prices dataset that comes with sklearn (as we did in Lecture 13). Details about the dataset and its columns are available here.

In this Exercise, I want you all to get creative and experiment! Instead of rigidly plotting exactly something that we ask, I want you to think of what you would be interested in plotting and exploring. Specifically, for this exercise, you have the utmost freedom to plot anything that you'd like from this data. You're expected to produce two plots, both of which should adhere to the principles learned in lecture (e.g., make it clear to understand/digest, effective, simple, not misleading, etc). Please feel inspired to challenge yourself by making a a type of plot you've never made before -- perhaps never even seen before! Further, you are not confined to using matplotlib; you can use any Python visualization library you want.

We load the data into a Pandas DataFrame for you, in case you find this helpful. Feel free to ignore this DataFrame if you rather just work directly with the data. It's totally up to you!

Resource: for tons of great coding examples, visit the matplotlib website.

CS109A Introduction to Data Science

Lecture 14, Exercise: Visualization

Harvard University
Fall 2020
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner


In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
In [7]:
# load the boston housing dataset
boston = load_boston()
boston_pd = pd.DataFrame(boston.data)
boston_pd.columns = boston.feature_names
boston_pd.describe()
Out[7]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032 12.653063
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864 7.141062
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000 1.730000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500 6.950000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000 11.360000
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.623500 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000 16.955000
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000 37.970000
In [8]:
# our canonical example
plt.figure(figsize=(5, 4))
plt.hist(boston.target)
plt.title('Boston Housing Prices')
plt.xlabel('Price ($1000s)')
plt.ylabel('# of Houses')
plt.show()
In [4]:
# YOUR FIRST PLOT
In [5]:
# YOUR SECOND PLOT