Key Word(s): EDA, Matplotlib, Effective Visualization, Graphical Integrity
Title :¶
Exercise: Visualization Improvisation
Description :¶
For this exercise we would like you to get creative and experiment!
You have the freedom to plot anything that you'd like from this data. You're expected to produce two plots, both of which should adhere to the principles learned in lecture (e.g., make it clear to understand/digest, effective, simple, not misleading, etc).
Please feel inspired to challenge yourself by making a type of plot you've never made before -- perhaps never even seen before! Give a brief explanation of the reason and usefulness of the plot.
Your data is the Boston housing prices dataset. We will load it directly from sklearn
Resource: for tons of great coding examples, visit the matplotlib website.
CS109A Introduction to Data Science
Lecture 14, Exercise: Visualization¶
Harvard University
Fall 2020
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
# load the boston housing dataset
boston = load_boston()
boston_pd = pd.DataFrame(boston.data)
boston_pd.columns = boston.feature_names
boston_pd.describe()
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 | 506.000000 |
mean | 3.613524 | 11.363636 | 11.136779 | 0.069170 | 0.554695 | 6.284634 | 68.574901 | 3.795043 | 9.549407 | 408.237154 | 18.455534 | 356.674032 | 12.653063 |
std | 8.601545 | 23.322453 | 6.860353 | 0.253994 | 0.115878 | 0.702617 | 28.148861 | 2.105710 | 8.707259 | 168.537116 | 2.164946 | 91.294864 | 7.141062 |
min | 0.006320 | 0.000000 | 0.460000 | 0.000000 | 0.385000 | 3.561000 | 2.900000 | 1.129600 | 1.000000 | 187.000000 | 12.600000 | 0.320000 | 1.730000 |
25% | 0.082045 | 0.000000 | 5.190000 | 0.000000 | 0.449000 | 5.885500 | 45.025000 | 2.100175 | 4.000000 | 279.000000 | 17.400000 | 375.377500 | 6.950000 |
50% | 0.256510 | 0.000000 | 9.690000 | 0.000000 | 0.538000 | 6.208500 | 77.500000 | 3.207450 | 5.000000 | 330.000000 | 19.050000 | 391.440000 | 11.360000 |
75% | 3.677083 | 12.500000 | 18.100000 | 0.000000 | 0.624000 | 6.623500 | 94.075000 | 5.188425 | 24.000000 | 666.000000 | 20.200000 | 396.225000 | 16.955000 |
max | 88.976200 | 100.000000 | 27.740000 | 1.000000 | 0.871000 | 8.780000 | 100.000000 | 12.126500 | 24.000000 | 711.000000 | 22.000000 | 396.900000 | 37.970000 |
# our canonical example
plt.figure(figsize=(5, 4))
plt.hist(boston.target)
plt.title('Boston Housing Prices')
plt.xlabel('Price ($1000)')
plt.ylabel('# of Houses')
plt.show()
# YOUR FIRST PLOT
# YOUR SECOND PLOT