Key Word(s): requests, web scraping, pandas, beautiful soup, parsing, eda
CS109A Introduction to Data Science
Lecture 3, Exercise 2: PANDAS Intro¶
Harvard University
Fall 2020
Instructors: Pavlos Protopapas, Kevin Rader, and Chris Tanner
NOTE: After running every cell, be sure to auto-grade your work by clicking 'Mark' in the lower-right corner. Otherwise, no credit will be given.¶
import pandas as pd
For this exercise, we will be working with the CS109 First Day survey results!
# import the CSV file
df = pd.read_csv("cs109a_student_survey.csv")
PANDAS Basics¶
Let's get started with basic functionality of PANDAS!
In the cell below, fill in the blank so that the variable cols
stores the df
's column names. NOTE: Please keep the type of the data structure as a
. Do not have to convert this to a list.
### edTest(test_a) ###
cols = ____
In the cell below, fill in the blank so that:
num_cols
stores the number of columns indf
### edTest(test_b) ###
num_rows = df.shape[0]
num_cols = ____
In the cell below, fill in the blank so that sneak_peak
is equal to the first 7 rows. (HINT)
### edTest(test_c) ###
sneak_peak = ____
In the cell below, fill in the blank so that the_end
is equal to the last 4 rows. (HINT)
### edTest(test_d) ###
the_end = ____
In the cell below, fill in the blank so that the python_experiences
variable stores a list of the 5 distinct values found within the Python experience
column of df
.
### edTest(test_e) ###
python_experiences = ________
In the cell below, fill in the blank so that the inventor
variable stores the DataFrame row(s) that correspond to everyone who is an "Inventor of Python".
### edTest(test_f) ###
inventor = ________
In the cell below, fill in the blank so that the utc1
variable stores the DataFrame rows that correspond to everyone who has a Timezone value of UTC+1 (Most of mainland Europe)
### edTest(test_g) ###
utc1 = ____________
In the cell below, fill in the blank so that the row56
variable stores the 56th row of df
. To be clear, imagine our DataFrame looked as follows:
. Name Age \ 0 Enrique 25 \ 1 Sheila 67 \ 2 Marcy 21 \ 3 Utibe 33
We'd say the 1st row is the one with Enrique, the 2nd row is the one with Sheila, the 3rd row is the one w/ Marcy, etc.
### edTest(test_h) ###
row56 = ________
In the cell below, fill in the blank so that sorted_df
now stores df
after sorting it by the Name column in ascending order (A -> Z)
### edTest(test_i) ###
sorted_df = ________
In the cell below, fill in the blank so that sorted_row56
stores the 56th row of sorted_df
. To be clear, imagine our sorted DataFrame looked as follows:
. Name Age \ 0 Enrique 25 \ 2 Marcy 21 \ 1 Sheila 67 \ 3 Utibe 33
We'd say the 1st row is the one with Enrique, the 2nd row is the one with Marcy, the 3rd row is the one w/ Sheila, etc.
### edTest(test_j) ###
sorted_row56 = ________