Title¶
Exercise 1 - Exploration, Wrangling, and Defining a Question
Description¶
Breakout #1 Tasks (15-20min):
- Someone share (the person who resides closest to the Bahamas…thanks Columbus). Someone different will share in the next breakout.
- Explore the data (some of that is done with you with code). Please do a little more exploration.
- Come up with an interesting question or two you can answer with this data set. Come up with a question or two that can be answered with supplemental data:
- start with ideal, and then get more practical based on what is likely available.
In [ ]:
import pandas as pd
import sys
import numpy as np
import sklearn as sk
import scipy as sp
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
%matplotlib inline
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor
import sklearn.metrics as met
from sklearn.preprocessing import PolynomialFeatures
In [ ]:
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')
movies.head()
In [ ]:
credits.head()
In [ ]:
print(movies.dtypes)
quants = movies.columns[(movies.dtypes == "int64") | (movies.dtypes == "float64") ].values
quants = quants[quants!='id']
In [ ]:
pd.Series(np.append(quants,'year'))
In [ ]:
movies['release_date'] = pd.to_datetime(movies['release_date'])
movies['year'] = pd.DatetimeIndex(movies['release_date']).year
movies['month'] = pd.DatetimeIndex(movies['release_date']).month
movies['decade'] = ((movies['year']) // 10)*10
In [ ]:
oldest = np.argmin(movies['release_date'])
newest = np.argmax(movies['release_date'])
print("Oldest Movie:" , movies['title'][oldest], " in", movies['release_date'][oldest])
print("Newest Movie:" , movies['title'][newest], " in", movies['release_date'][newest])
In [ ]:
sns.pairplot(movies[np.append(quants,'year')]);
In [ ]:
movies_raw = movies.copy()
Breakout 1 Tasks (15-20min):¶
- Someone share (the person who resides closest to the Bahamas…thanks Columbus). Someone different will share in the next breakout.
- Explore the data (some of that is done for you above). Please do a little more exploration and wrangling.
- Come up with an interesting question or two you can answer with this data set. Come up with a question or two that can be answered with supplemental data:
- start with ideal, and then get more practical based on what is likely available.
In [ ]: