{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CS109A Introduction to Data Science \n", "\n", "## Standard Section 4: Regularization and Model Selection\n", "\n", "**Harvard University**
\n", "**Fall 2019**
\n", "**Instructors**: Pavlos Protopapas, Kevin Rader, and Chris Tanner
\n", "**Section Leaders**: Marios Mattheakis, Abhimanyu (Abhi) Vasishth, Robbert (Rob) Struyven
\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#RUN THIS CELL \n", "import requests\n", "from IPython.core.display import HTML\n", "styles = requests.get(\"http://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css\").text\n", "HTML(styles)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this section, our goal is to get you familiarized with Regularization in Multiple Linear Regression and to start thinking about Model and Hyper-Parameter Selection. \n", "\n", "Specifically, we will:\n", "\n", "- Load in the King County House Price Dataset\n", "- Perform some basic EDA\n", "- Split the data up into a training, **validation**, and test set (we'll see why we need a validation set)\n", "- Scale the variables (by standardizing them) and seeing why we need to do this\n", "- Make our multiple & polynomial regression models (like we did in the previous section)\n", "- Learn what **regularization** is and how it can help\n", "- Understand **ridge** and **lasso** regression\n", "- Get an introduction to **cross-validation** using RidgeCV and LassoCV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Data and Stats packages\n", "import numpy as np\n", "import pandas as pd\n", "pd.set_option('max_columns', 200)\n", "\n", "# Visualization packages\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set()\n", "\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# EDA: House Prices Data From Kaggle\n", "\n", "For our dataset, we'll be using the house price dataset from [King County, WA](https://en.wikipedia.org/wiki/King_County,_Washington). The dataset is from [Kaggle](https://www.kaggle.com/harlfoxem/housesalesprediction). \n", "\n", "The task is to build a regression model to **predict the price**, based on different attributes. First, let's do some EDA." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4000, 21)\n", "id int64\n", "date object\n", "price float64\n", "bedrooms int64\n", "bathrooms float64\n", "sqft_living int64\n", "sqft_lot int64\n", "floors float64\n", "waterfront int64\n", "view int64\n", "condition int64\n", "grade int64\n", "sqft_above int64\n", "sqft_basement int64\n", "yr_built int64\n", "yr_renovated int64\n", "zipcode int64\n", "lat float64\n", "long float64\n", "sqft_living15 int64\n", "sqft_lot15 int64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iddatepricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditiongradesqft_abovesqft_basementyr_builtyr_renovatedzipcodelatlongsqft_living15sqft_lot15
735259182031020141006T000000365000.042.25207088932.0004820700198609805847.4388-122.16223907700
2830797420082020140821T000000865000.053.00290067301.0005818301070197709811547.6784-122.28523706283
4106770145011020140815T0000001038000.042.503770108932.00231137700199709800647.5646-122.12937109685
16218952230001020150331T0000001490000.033.504560146082.00231245600199009803447.6995-122.228405014226
19964951086114020140714T000000711000.032.50255053762.0003925500200409805247.6647-122.08322504050
\n", "
" ], "text/plain": [ " id date price bedrooms bathrooms \\\n", "735 2591820310 20141006T000000 365000.0 4 2.25 \n", "2830 7974200820 20140821T000000 865000.0 5 3.00 \n", "4106 7701450110 20140815T000000 1038000.0 4 2.50 \n", "16218 9522300010 20150331T000000 1490000.0 3 3.50 \n", "19964 9510861140 20140714T000000 711000.0 3 2.50 \n", "\n", " sqft_living sqft_lot floors waterfront view condition grade \\\n", "735 2070 8893 2.0 0 0 4 8 \n", "2830 2900 6730 1.0 0 0 5 8 \n", "4106 3770 10893 2.0 0 2 3 11 \n", "16218 4560 14608 2.0 0 2 3 12 \n", "19964 2550 5376 2.0 0 0 3 9 \n", "\n", " sqft_above sqft_basement yr_built yr_renovated zipcode lat \\\n", "735 2070 0 1986 0 98058 47.4388 \n", "2830 1830 1070 1977 0 98115 47.6784 \n", "4106 3770 0 1997 0 98006 47.5646 \n", "16218 4560 0 1990 0 98034 47.6995 \n", "19964 2550 0 2004 0 98052 47.6647 \n", "\n", " long sqft_living15 sqft_lot15 \n", "735 -122.162 2390 7700 \n", "2830 -122.285 2370 6283 \n", "4106 -122.129 3710 9685 \n", "16218 -122.228 4050 14226 \n", "19964 -122.083 2250 4050 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load the dataset \n", "house_df = pd.read_csv('../data/kc_house_data.csv')\n", "house_df = house_df.sample(frac=1, random_state=42)[0:4000]\n", "print(house_df.shape)\n", "print(house_df.dtypes)\n", "house_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's check for null values and look at the datatypes within the dataset." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 4000 entries, 735 to 3455\n", "Data columns (total 21 columns):\n", "id 4000 non-null int64\n", "date 4000 non-null object\n", "price 4000 non-null float64\n", "bedrooms 4000 non-null int64\n", "bathrooms 4000 non-null float64\n", "sqft_living 4000 non-null int64\n", "sqft_lot 4000 non-null int64\n", "floors 4000 non-null float64\n", "waterfront 4000 non-null int64\n", "view 4000 non-null int64\n", "condition 4000 non-null int64\n", "grade 4000 non-null int64\n", "sqft_above 4000 non-null int64\n", "sqft_basement 4000 non-null int64\n", "yr_built 4000 non-null int64\n", "yr_renovated 4000 non-null int64\n", "zipcode 4000 non-null int64\n", "lat 4000 non-null float64\n", "long 4000 non-null float64\n", "sqft_living15 4000 non-null int64\n", "sqft_lot15 4000 non-null int64\n", "dtypes: float64(5), int64(15), object(1)\n", "memory usage: 687.5+ KB\n" ] } ], "source": [ "house_df.info()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idpricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditiongradesqft_abovesqft_basementyr_builtyr_renovatedzipcodelatlongsqft_living15sqft_lot15
count4.000000e+034.000000e+034000.0000004000.0000004000.0000004.000000e+034000.0000004000.0000004000.0000004000.0000004000.0000004000.0000004000.000004000.0000004000.0000004000.0000004000.0000004000.0000004000.000004000.00000
mean4.586542e+095.497522e+053.3792502.1165632096.6452501.616511e+041.4750000.0077500.2325003.4207507.6682501792.465000304.180251970.56425089.80150098078.03550047.560091-122.2140601997.7590012790.67800
std2.876700e+093.890505e+050.9225680.783175957.7851415.120888e+040.5302790.0877030.7681740.6463931.194173849.986192455.2635429.141872413.76008254.0733740.1390700.141879701.6098726085.20301
min1.000102e+068.250000e+040.0000000.000000384.0000005.720000e+021.0000000.0000000.0000001.0000004.000000384.0000000.000001900.0000000.00000098001.00000047.155900-122.515000620.00000659.00000
25%2.126074e+093.249500e+053.0000001.7500001420.0000005.200000e+031.0000000.0000000.0000003.0000007.0000001180.0000000.000001951.0000000.00000098033.00000047.468175-122.3280001490.000005200.00000
50%3.889350e+094.550000e+053.0000002.2500001920.0000007.675000e+031.0000000.0000000.0000003.0000007.0000001550.0000000.000001974.5000000.00000098065.00000047.573800-122.2310001840.000007628.00000
75%7.334526e+096.541250e+054.0000002.5000002570.0000001.087125e+042.0000000.0000000.0000004.0000008.0000002250.000000590.000001995.0000000.00000098118.00000047.679100-122.1270002370.0000010240.00000
max9.842300e+095.570000e+0611.0000008.00000013540.0000001.651359e+063.5000001.0000004.0000005.00000013.0000009410.0000004130.000002015.0000002015.00000098199.00000047.777500-121.3150005790.00000560617.00000
\n", "
" ], "text/plain": [ " id price bedrooms bathrooms sqft_living \\\n", "count 4.000000e+03 4.000000e+03 4000.000000 4000.000000 4000.000000 \n", "mean 4.586542e+09 5.497522e+05 3.379250 2.116563 2096.645250 \n", "std 2.876700e+09 3.890505e+05 0.922568 0.783175 957.785141 \n", "min 1.000102e+06 8.250000e+04 0.000000 0.000000 384.000000 \n", "25% 2.126074e+09 3.249500e+05 3.000000 1.750000 1420.000000 \n", "50% 3.889350e+09 4.550000e+05 3.000000 2.250000 1920.000000 \n", "75% 7.334526e+09 6.541250e+05 4.000000 2.500000 2570.000000 \n", "max 9.842300e+09 5.570000e+06 11.000000 8.000000 13540.000000 \n", "\n", " sqft_lot floors waterfront view condition \\\n", "count 4.000000e+03 4000.000000 4000.000000 4000.000000 4000.000000 \n", "mean 1.616511e+04 1.475000 0.007750 0.232500 3.420750 \n", "std 5.120888e+04 0.530279 0.087703 0.768174 0.646393 \n", "min 5.720000e+02 1.000000 0.000000 0.000000 1.000000 \n", "25% 5.200000e+03 1.000000 0.000000 0.000000 3.000000 \n", "50% 7.675000e+03 1.000000 0.000000 0.000000 3.000000 \n", "75% 1.087125e+04 2.000000 0.000000 0.000000 4.000000 \n", "max 1.651359e+06 3.500000 1.000000 4.000000 5.000000 \n", "\n", " grade sqft_above sqft_basement yr_built yr_renovated \\\n", "count 4000.000000 4000.000000 4000.00000 4000.000000 4000.000000 \n", "mean 7.668250 1792.465000 304.18025 1970.564250 89.801500 \n", "std 1.194173 849.986192 455.26354 29.141872 413.760082 \n", "min 4.000000 384.000000 0.00000 1900.000000 0.000000 \n", "25% 7.000000 1180.000000 0.00000 1951.000000 0.000000 \n", "50% 7.000000 1550.000000 0.00000 1974.500000 0.000000 \n", "75% 8.000000 2250.000000 590.00000 1995.000000 0.000000 \n", "max 13.000000 9410.000000 4130.00000 2015.000000 2015.000000 \n", "\n", " zipcode lat long sqft_living15 sqft_lot15 \n", "count 4000.000000 4000.000000 4000.000000 4000.00000 4000.00000 \n", "mean 98078.035500 47.560091 -122.214060 1997.75900 12790.67800 \n", "std 54.073374 0.139070 0.141879 701.60987 26085.20301 \n", "min 98001.000000 47.155900 -122.515000 620.00000 659.00000 \n", "25% 98033.000000 47.468175 -122.328000 1490.00000 5200.00000 \n", "50% 98065.000000 47.573800 -122.231000 1840.00000 7628.00000 \n", "75% 98118.000000 47.679100 -122.127000 2370.00000 10240.00000 \n", "max 98199.000000 47.777500 -121.315000 5790.00000 560617.00000 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "house_df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's choose a subset of columns here. **NOTE**: The way I'm selecting columns here is not principled and is just for convenience. In your homework assignments (and in real life), we expect you to choose columns more rigorously.\n", "\n", "1. `bedrooms`\n", "2. `bathrooms`\n", "3. `sqft_living`\n", "4. `sqft_lot`\n", "5. `floors`\n", "6. `sqft_above`\n", "7. `sqft_basement`\n", "8. `lat`\n", "9. `long`\n", "10. **`price`**: Our response variable" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "cols_of_interest = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'sqft_above', 'sqft_basement',\n", " 'lat', 'long', 'price']\n", "house_df = house_df[cols_of_interest]\n", "\n", "# Convert house price to 1000s of dollars\n", "house_df['price'] = house_df['price']/1000" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how the response variable (`price`) is distributed" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAE/CAYAAABb1DVKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAfGklEQVR4nO3de5xcZZ3n8U/nHk1HJDQvQBBv5AfeiHLRWURQEBeHTGYW0BUQ4gjKCursBl13xAuMdwkoDuIFGHSj4kxAYuSyjIqCF1RQ4gX5iQ6gQGbF6JiAuSfzxzkND91V6UuqurrSn/frlVdSTz11zq/O05361lNPndOzdetWJEmSJFUmdboASZIkaTwxIEuSJEkFA7IkSZJUMCBLkiRJBQOyJEmSVDAgS5IkSYUpnS5AUvtFxFagLzN/X7QtBI7LzGMi4lzgV5n5uW1s413Aisxc1vaCWywi5gFXAv8BHJuZ9xT3DTo23SAiDgTenpnHtWh7k4FlwOuAvwF2yswPjmI7RwEfzsx5RdtfAh8ApgM/AV6XmavrfS4G/ivV69F5mfnJ+jH7AJcCuwAPASdn5p3b8RT7a5kMXAXsB1yYmf+4jb5fBZZm5uXb6LOQR3+Pvgn8Y2YubUGdBwBvyMzXb++2JI2cAVkSmfmuYXR7KXBHu2tpk78CbszMUztdSKtk5q1AS8JxbRHwzcz8/8AnR/rgiJgJvAM4A7i/aO8D/gk4JDPviogPAR8E3gi8AZgLPBvoBb4XET/KzB8Anwc+mplfiIijgaUR8ZzM3N6T9z8JeDnw+MzcvJ3bapvMvC0ipkTEMZn51U7XI000BmRJRMTlwM8y87yIOIdqBnEDsApYCPw34EDgIxGxGfgGcBEwD9gKXAf8fWZuiohXAB8CNgO3A0cCLwIOp5qdfDzwJ+AY4GJgH2AOsAY4ITOznom7DXghsCvwaWA34LD68a/MzJ82eB7vBF4NbAJ+CZwJHEEVxiZHxMzMPLHBITgnIl5Y1/GRzLyo2fYy898HzhSWtxsdv8xcGRH7AR+r9zGZavbysgbP4R7gi8DLgJ2AxZl5cUQcXj/+YWAW8Nb6vmdHxCzg48Ahda1XU4XVqfVYHFbv88fAmzNz9YB9Pg74O+A59e33ALtk5pl1PZfXx/HJwOcy850NjuHLqcbmFOD9RftRwA8z86769sXAiog4oz5On87MTcAfI+IK4KSIuB/YF7gCIDOvi4iLgedFxAPA56hmlgGuaVRPRBwKfAR4HNVYnA18B7i+Pi63RcSxmfnr4jF7AJ8F9gDupfrZa7q9zLy+wXHo7//3wAJgZn1czsrML9fH9i/qfawA3ks1Uz4D6AEuycxP1Jv5dH28DMjSGHMNsjRx3BgRt/f/Ac4d2CEi9qIKSgdl5oHADcAL6sB4K/DWzPwycCFV+HsOVXDeHzgrIuYA/xc4qf6I/UaqGbt+zwIOz8yXAEcD/5GZf5GZc4EfUgXafk/JzEOAk4APU81uHkgVcN7UoPbX1ts8KDOfC/wMuDwzP081I/qlJuEY4N8y8wCqwLY4IqY2216Tx2/z+EXEFGAp1ZKIA6gC61l1KG9kZ+AgqjcV50bEc+r2ZwOvrutZX/Q/lypg7Uf1puWQeh9vpwrMB2Tm/sADVLO3A70U+GVmrmpSz6zMPBT4L3XdTx3YITOvzsz/CawecNdewG+L2/cBs6lmjBvdt2fd/kBmbmlw32lU4/V84FBgn4h4QrnD+udwKfCW+lidAiyhCtWvANZm5rwyHNcuAm7JzGcBb6YK6U231+g41P33pnpjeHjd/x089vdtb+B5mXkS1Rud5fXPxSuAF0fEJIDMvAV4erP9SGofA7I0cbykDgXz6vDaaFnF/VSzWj+KiPOA2zPz6gb9jqaaMd2ameupAujRwIuBOzJzBUBmfpbHBqaf9M9e1rOvl0fEmyLiY1RhcFbR96r67/4Qc31xe+cmNf1TZj5c3/4YcERETGt0MAb4Qv337VTrZGePcnvNjt9c4OnAZfWbk29RzSw+r8l2LqqP7X1Uz/uouv23mXlvg/5HApdm5ubM3JCZh2XmN6lm6RcAP673+9fAMxs8fl/gV9t4XssAMvN+4Hc0Pv7NTKL6lGGgzQ3u62nSXt53PXBsRFxLtUTj7Zn5pwF9X0C1pv77dd0/p5o9PnyIWo+kfhOUmb+i+qRkxNurx+hk4MSI+CBwOo/92b6lnjUH+DLwtoi4iuqTmjcPeGNwNxBD1C2pxQzIkh5RvzAfRrWsYhVwQUR8uEHXgQFmEtXH1puogkypfLF/qP8fEfE/qD5a/jNVQP3igMeWM6Rk5sYhyp/coKYpDeppZGO9j/7H9wyxva0Dtjutfnyz4zcZ+NOANygvpFqb28im4t+TqIIhFMevQf9Hao2IvepZz8lUs579+zyYxuuWt7Lt14O1A/oO55j2+w3VcoJ+TwL+WL/xGHjfHlQzxb8Bdo+InoH3ZeYPgadSLT94CvCD+gttpYFjB4/+jG7LwOfWPw4j2l5EPB/4HtUbrRuolrmU231kHOv1xfsA/0z1humnEbFn0Xcjj46/pDFiQJb0iIjYn2opwS8y8wPABVQf9UMVFvoDwf8DzoyInoiYDrwe+FeqWbW5EfHcenvHUq2jbTSD+HKqJRCXAgnMpwoio3U98LcR8fj69puBm+oZ7lZv70GqpSVExDOB/ufb7PglsDYiTqr77VX3Gxjs+p1c93sy1ezxdUPU+jXglIiYVI/HUqqg3j9O0+qP7T9DdTaJgZJqhrsdbgBeWJ+VAqrZ1P4zoSyjOsZTImIn4L8DV9cz578CXgUQES+neqP103pG9p31zPxbgJ9TLT0pfQ/YNyIOrh//LKpPN745RK3XU/0s9x/7l4xyey8Gbs3M86k+LfhrmvxsR8QXgFdl5hVUa+VXU49F/QZhb6rxkTSGDMiSHlEvjfhn4NaIuBX4W+B/1Xd/BfhARJxCFRZ3BX5a/0ngfZn5B6ovtX0uIn5EFYI3Uc0SD3Qe8IaI+AlwM/Aj4BnbUf6lVEHxBxHxC+D5QLM1x9u7vfcCR0XEz6jWlt4EzY9fZm6gWupwav18b6AKed9psu+nRsRtVIHtzZk5VEA6h+qLYyuovoh3bWZeBfwDcE/ddgfVLOaiBo//GlUA3GmI/YxYZv4OeC3VWSh+QbVuvb+Gi6mWzKygWoN+aWZ+q77v1cDp9TF+H3B8PUP/UWBe3X4r1RKEKwbs8/fA8cDHI+KnVJ9QvDYzfzlEuWcAz6zrvJRqyc1otvdFYJd6O3dQzRjvHBG9Dfr+A9VSjBXA96mWXNxU33cg8OvM/M0QdUtqsZ6tW7f3jDmSVImI2VRnC3hPZv65/qj5GmCP3P7Tc00I9VkjjsvqNG5jud+/BzZlZqMlNeqA+uwy/5KZ13S6FmmicQZZUsvUX8DbAPyw/lLYp6hOyWY4Hv/OA14aEbt1uhA9cqGQrYZjqTOcQZYkSZIKziBLkiRJBQOyJEmSVBhPl5qeTnU6pJV4zkdJkiS1z2Rgd6oz6Aw6Heh4CsgHUZ3qSZIkSRoLhwLfHtg4ngLySoA//vFhtmxp3RcH58yZxapVzS4+pfHMsetejl33cuy6l2PXvRy7sTdpUg9PfOLjoc6fA42ngLwZYMuWrS0NyP3bVHdy7LqXY9e9HLvu5dh1L8euYxou6/VLepIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSYTxdSW+H0Dt7JjOmDz6s69ZvYs3qtR2oSJIkSSNhQG6xGdOnMH/RskHtyxcvYE0H6pEkSdLIuMRCkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSp4Jb0xsmHjZvr6ege1ewlqSZKk8cWAPEamTZ3sJaglSZK6gEssJEmSpIIBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpMKU4XaMiPOAXTJzYUTMAy4BZgM3Aadn5qaIeDKwBNgVSODEzHyoDXVLkiRJbTGsGeSIOAI4pWhaApyZmXOBHuC0uv0TwCcyc1/gVuCdLaxVkiRJarshA3JE7Ay8D3h/fXtvYGZm3lJ3uRw4PiKmAi8GlpbtLa53h7Nh42b6+noH/emdPbPTpUmSJE1Iw1li8SngHcBe9e09gJXF/SuBPYFdgNWZuWlA+4jMmTNrpA8ZUl9fb8u32SrTpk5m/qJlg9qXL17AjHFc91gZz2OnbXPsupdj170cu+7l2I0v2wzIEXEq8NvM/HpELKybJwFbi249wJYG7dTtI7Jq1UNs2TJwM6PX19fLgw+uadn2hrO/VhnLusejsR47tY5j170cu+7l2HUvx27sTZrUs81J2aFmkF8F7B4RtwM7A7OoQvDuRZ/dgAeA3wFPiIjJmbm57vPAdtQuSZIkjbltrkHOzJdl5rMzcx7wLuArmflaYF1EHFJ3ew1wXWZuBG6mCtUAJwPXtaluSZIkqS1Gex7kE4ELIuJOqlnlC+v2NwKvj4g7gEOBs7e/REmSJGnsDPs8yJl5OdWZKcjMFcDBDfrcCxzemtIkSZKkseeV9CRJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqTClE4XoMY2bNxMX1/voPZ16zexZvXaDlQkSZI0MRiQx6lpUyczf9GyQe3LFy9gTQfqkSRJmihcYiFJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSYUqnC9DIbNi4mb6+3kHt69ZvYs3qtR2oSJIkacdiQO4y06ZOZv6iZYPaly9ewJoO1CNJkrSjcYmFJEmSVDAgS5IkSQUDsiRJklQY1hrkiDgXOA7YClyamedHxJHA+cBM4EuZeXbddx5wCTAbuAk4PTM3taN4SZIkqdWGnEGOiMOAlwLPBQ4E3hQR+wOXAQuA/YCDIuLo+iFLgDMzcy7QA5zWjsIlSZKkdhgyIGfmt4CX1LPAu1LNOu8E3JWZd9ftS4DjI2JvYGZm3lI//HLg+LZULkmSJLXBsJZYZObGiDgHOAv4F2APYGXRZSWw5zbah23OnFkj6T4sjc4bvCPaEZ/njvicJgrHrns5dt3Lsetejt34MuzzIGfmuyPiQ8ByYC7VeuR+PcAWqhnpRu3DtmrVQ2zZsnXojsPU19fLgw+O3RmCO/kDPpbPcyyM9dipdRy77uXYdS/Hrns5dmNv0qSebU7KDhmQI2JfYEZm3p6Zf46Iq6i+sLe56LYb8ABwH7B7g3a1mVfYkyRJao3hzCA/DTgnIl5ENTu8APgU8JGIeAZwN3ACcFlm3hsR6yLikMz8DvAa4Lo21a6CV9iTJElqjeF8Se9a4Brgx8BtwHcz8wpgIXAlcAdwJ7C0fsiJwAURcScwC7iw9WVLkiRJ7THcL+m9B3jPgLavA/s36LsCOLgFtUmSJEljzivpSZIkSQUDsiRJklQwIEuSJEkFA7IkSZJUMCBLkiRJBQOyJEmSVDAgS5IkSQUDsiRJklQwIEuSJEkFA7IkSZJUMCBLkiRJhSmdLqBb9c6eyYzpHj5JkqQdjQlvlGZMn8L8RcsGtS9fvKAD1UiSJKlVXGIhSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBWmdLoAtdeGjZvp6+tteN+69ZtYs3rtGFckSZI0vhmQd3DTpk5m/qJlDe9bvngBa8a4HkmSpPHOJRaSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUmHKcDpFxLuBV9Y3r8nMt0XEkcD5wEzgS5l5dt13HnAJMBu4CTg9Mze1vHJJkiSpDYacQa6D8FHA84B5wAER8WrgMmABsB9wUEQcXT9kCXBmZs4FeoDT2lG4JEmS1A7DWWKxEliUmRsycyPwC2AucFdm3l3PDi8Bjo+IvYGZmXlL/djLgePbULckSZLUFkMuscjMn/f/OyL2oVpq8XGq4NxvJbAnsEeT9mGbM2fWSLoPS19fb8u3uaMY78dmvNen5hy77uXYdS/Hrns5duPLsNYgA0TEs4BrgLcCm6hmkfv1AFuoZqS3NmgftlWrHmLLlq1Ddxymvr5eHnxwTcu2V253R9COY9Mq7Ro7tZ9j170cu+7l2HUvx27sTZrUs81J2eF+Se8Q4Erg7zLziog4DNi96LIb8ABwX5N2jUMbNm5uGPTXrd/EmtVrO1CRJElS5w0ZkCNiL+Bq4FWZ+Y26+fvVXfEM4G7gBOCyzLw3ItZFxCGZ+R3gNcB1bapd22na1MnMX7RsUPvyxQvwfawkSZqohjODfBYwAzg/IvrbPgkspJpVngFcCyyt7zsR+ExEzAZ+BFzYwnolSZKkthrOl/TeArylyd37N+i/Ajh4O+uSJEmSOsIr6UmSJEkFA7IkSZJUMCBLkiRJBQOyJEmSVDAgS5IkSQUDsiRJklQwIEuSJEkFA7IkSZJUMCBLkiRJBQOyJEmSVDAgS5IkSQUDsiRJklQwIEuSJEkFA7IkSZJUmNLpAjT+bNi4mb6+3kHt69ZvYs3qtR2oSJIkaewYkDXItKmTmb9o2aD25YsXsKYD9UiSJI0ll1hIkiRJBQOyJEmSVDAgS5IkSQUDsiRJklQwIEuSJEkFA7IkSZJUMCBLkiRJBQOyJEmSVDAgS5IkSQUDsiRJklQwIEuSJEkFA7IkSZJUMCBLkiRJBQOyJEmSVDAgS5IkSYUpnS5A3WPDxs309fUOal+3fhNrVq/tQEWSJEmtZ0DWsE2bOpn5i5YNal++eAFrOlCPJElSO7jEQpIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpIIBWZIkSSp4HmRtNy8gIkmSdiQGZG03LyAiSZJ2JC6xkCRJkgoGZEmSJKlgQJYkSZIKBmRJkiSpMOwv6UXEbOC7wDGZeU9EHAmcD8wEvpSZZ9f95gGXALOBm4DTM3NTyyuXJEmS2mBYM8gR8QLg28Dc+vZM4DJgAbAfcFBEHF13XwKcmZlzgR7gtFYXLUmSJLXLcJdYnAacATxQ3z4YuCsz765nh5cAx0fE3sDMzLyl7nc5cHwL65UkSZLaalhLLDLzVICI6G/aA1hZdFkJ7LmN9mGbM2fWSLoPS6OLWGhsbO+xd+y6l2PXvRy77uXYdS/HbnwZ7YVCJgFbi9s9wJZttA/bqlUPsWXL1qE7DlNfXy8PPtj6y1X4gzw823Ps2zV2aj/Hrns5dt3Lsetejt3YmzSpZ5uTsqM9i8V9wO7F7d2oll80a5ckSZK6wmgD8veBiIhnRMRk4ATgusy8F1gXEYfU/V4DXNeCOiVJkqQxMaqAnJnrgIXAlcAdwJ3A0vruE4ELIuJOYBZw4faXKUmSJI2NEa1BzsynFP/+OrB/gz4rqM5yIUmSJHUdr6QnSZIkFQzIkiRJUsGALEmSJBVGex7kCaN39kxmTPcwSZIkTRQmvyHMmD6F+YuWDWpfvnhBB6qRJElSu7nEQpIkSSo4g6y22bBxc8NLcq9bv4k1q9d2oCJJkqShGZDVNtOmTm66PMUrzkuSpPHKJRaSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsGALEmSJBUMyJIkSVLBgCxJkiQVDMiSJElSYUqnC5D69c6eyYzpj/2R7OvrZd36TaxZvbZDVUmSpInGgKxxY8b0KcxftGxQ+/LFC1jTgXokSdLE5BILSZIkqeAMssbcho2b6evr7XQZkiRJDRmQNeamTZ3cdCmFJElSp7nEQpIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKngWC00Yja7UB3ilPkmS9BgGZE0YXqlPkiQNhwG51mx2UZ3X7MIi6zdsZvq0yYPanRGWJEnbw0RY29bsojprWxcWadR+5QeP8Up9kiRp1AzI2uF4pT5JkrQ9DMia8Jot4QCXcUiSNBEZkDXhNZtxhubLOPxinyRJOy7PgyxJkiQVDMiSJElSwYAsSZIkFQzIkiRJUsEv6Umj0OzMF57dQpKk7mdAlkah2Zkvml2kZKTBudmVHZttZ6T9JUlScwZkqYVaFZy3dWXHRqeXG2n/ZrZ1yXXDtiRpomhLQI6IE4CzganARzPzonbsR+oWIw3OzWzroiat0Cxog+d+liRNHC0PyBHxJOB9wAHAeuC7EXFjZt7R6n1J3W6kl8Ueaf9mgbrZFQK3ZaTb6p09s60zzp1aVuIsuyTt+Noxg3wk8I3M/ANARCwFjgPOHeJxkwEmTeppeUHD3eauT5w5odrHY03jrX081jSS9mlTJ/O6994wqP3Ss49q2t5s+6PZ1sMj+H2eNWsG0xsEz22F+Ub7vfh/H9E4yK/fxEMPrRv+fpv0nzF9SsP9jmbf7TbS59avFf8Pj3Tfo621XUb689ipOgdqx2uoxoZjN7aK493wBaZn69atLd1hRPwf4PGZeXZ9+1Tg4Mx8/RAPfRFwc0uLkSRJkpo7FPj2wMZ2zCBPAsrU3QNsGcbjfkhV5EpgcxvqkiRJkqCaOd6dKn8O0o6AfB9V0O23G/DAMB63ngYJXpIkSWqDXze7ox0B+WvAeyKiD3gYOBYYanmFJEmSNC60/FLTmXk/8A7gRuB24AuZ+YNW70eSJElqh5Z/SU+SJEnqZi2fQZYkSZK6mQFZkiRJKhiQJUmSpIIBWZIkSSoYkCVJkqRCO86DPG5ExAnA2cBU4KOZeVGHSxIQEbOB7wLHZOY9EXEkcD4wE/hScZnyecAlwGzgJuD0zNwUEU8GlgC7AgmcmJkPdeCpTCgR8W7glfXNazLzbY5dd4iIc4HjqK5yemlmnu/YdZeIOA/YJTMXjnSMImIn4PPA04AHgVdm5r935IlMIBFxI9VYbKyb3gA8nQa5ZKS/j2P5PCaqHXYGOSKeBLwPeBEwD3h9RDyzs1UpIl5AdcXEufXtmcBlwAJgP+CgiDi67r4EODMz51Jdsvy0uv0TwCcyc1/gVuCdY/cMJqb6P++jgOdR/T4dEBGvxrEb9yLiMOClwHOBA4E3RcT+OHZdIyKOAE4pmkY6Ru8Fbs7M/YDPAB8bk8InsIjooXqd2z8z52XmPKorDQ/KJaN8HVSb7bABGTgS+EZm/iEzHwaWUs2gqLNOA87g0cuPHwzclZl31++KlwDHR8TewMzMvKXud3ndPhV4MdV4PtI+RrVPZCuBRZm5ITM3Ar+g+s/fsRvnMvNbwEvqMdqV6pPDnXDsukJE7EwVqt5f3x7NGP0l1QwywBeBo+v+ap+o/74hIlZExJk0zyUjeh0c02cxge3IAXkPqhf1fiuBPTtUi2qZeWpm3lw0NRunZu27AKuLj5gc1zGQmT/v/086IvahWmqxBceuK2Tmxog4B7gD+Dr+3nWTT1FdnfaP9e3RjNEjj6nvXw30tbfsCe+JVL9rfwMcAZwOPJmR/d6ZYzpoRw7Ik6jW2/XroXpB1/jSbJyG2w6O65iJiGcB/wq8Ffg3HLuukZnvpgpFe1HN/jt241xEnAr8NjO/XjSPZox6BrT7ethmmfm9zDw5M/+Umb8HLgXOZft+7xy3MbQjB+T7gN2L27vx6Mf6Gj+ajVOz9t8BT4iIyXX77jiuYyIiDqGaEXl7Zn4Wx64rRMS+9Rd9yMw/A1cBh+PYdYNXAUdFxO1U4eqvgFMZ+RjdX/cjIqYAvcCqtlc/gUXEi+q14/16gHsY2e+dOaaDduSA/DXgiIjoi4jHAccC13e4Jg32fSAi4hn1f+wnANdl5r3AujqUAbymbt8I3Ez1wgFwMnDdWBc90UTEXsDVwAmZeUXd7Nh1h6cBn4mI6RExjeqLQJ/CsRv3MvNlmfns+gte7wK+kpmvZeRjdG19m/r+m+v+ap+dgI9ExIyI6KX6kuVJNM4lI/q/dMyfyQS1wwbkzLyfat3WjcDtwBcy8wedrUoDZeY6YCFwJdX6yDt59EsmJwIXRMSdwCzgwrr9jVTf/r0DOJTqlDlqr7OAGcD5EXF7PaO1EMdu3MvMa4FrgB8DtwHfrd/kLMSx61YjHaN3Ai+MiJ/Xfc4Y43onnMz8Ko/9vbssM79Dg1wyytdBtVnP1q0DlyxJkiRJE9cOO4MsSZIkjYYBWZIkSSoYkCVJkqSCAVmSJEkqGJAlSZKkggFZkiRJKhiQJUmSpMJ/AhprhVt5imRGAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(12,5))\n", "ax.hist(house_df['price'], bins=100)\n", "ax.set_title('Histogram of house price (in 1000s of dollars)');" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# This takes a bit of time but is worth it!!\n", "# sns.pairplot(house_df);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train-Validation-Test Split\n", "\n", "Up until this point, we have only had a train-test split. Why are we introducing a validation set? What's the point?\n", "\n", "This is the general idea: \n", "\n", "1. **Training Set**: Data you have seen. You train different types of models with various different hyper-parameters and regularization parameters on this data. \n", "\n", "\n", "2. **Validation Set**: Used to compare different models. We use this step to tune our hyper-parameters i.e. find the optimal set of hyper-parameters (such as $k$ for k-NN or our $\\beta_i$ values or number of degrees of our polynomial for linear regression). Pick your best model here. \n", "\n", "\n", "\n", "3. **Test Set**: Using the best model from the previous step, simply report the score e.g. R^2 score, MSE or any metric that you care about, of that model on your test set. **DON'T TUNE YOUR PARAMETERS HERE!**. Why, I hear you ask? Because we want to know how our model might do on data it hasn't seen before. We don't have access to this data (because it may not exist yet) but the test set, which we haven't seen or touched so far, is a good way to mimic this new data. \n", "\n", "Let's do 60% train, 20% validation, 20% test for this dataset." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train Set: 60.00%\n", "Validation Set: 20.00%\n", "Test Set: 20.00%\n" ] } ], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "# first split the data into a train-test split and don't touch the test set yet\n", "train_df, test_df = train_test_split(house_df, test_size=0.2, random_state=42)\n", "\n", "# next, split the training set into a train-validation split\n", "# the test-size is 0.25 since we are splitting 80% of the data into 20% and 60% overall\n", "train_df, val_df = train_test_split(train_df, test_size=0.25, random_state=42)\n", "\n", "print('Train Set: {0:0.2f}%'.format(100*train_df.size/house_df.size))\n", "print('Validation Set: {0:0.2f}%'.format(100*val_df.size/house_df.size))\n", "print('Test Set: {0:0.2f}%'.format(100*test_df.size/house_df.size))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Modeling\n", "\n", "In the [last section](https://github.com/Harvard-IACS/2019-CS109A/tree/master/content/sections/section3), we went over the mechanics of Multiple Linear Regression and created models that had interaction terms and polynomial terms. Specifically, we dealt with the following sorts of models. \n", "\n", "$$\n", "y = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\dots + \\beta_M x_M\n", "$$\n", "\n", "Let's adopt a similar process here and get a few different models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a Design Matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From our model setup in the equation in the previous section, we obtain the following: \n", "\n", "$$\n", "Y = \\begin{bmatrix}\n", "y_1 \\\\\n", "y_2 \\\\\n", "\\vdots \\\\\n", "y_n\n", "\\end{bmatrix}, \\quad X = \\begin{bmatrix}\n", "x_{1,1} & x_{1,2} & \\dots & x_{1,M} \\\\\n", "x_{2,1} & x_{2,2} & \\dots & x_{2,M} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "x_{n,1} & x_{n,2} & \\dots & x_{n,M} \\\\\n", "\\end{bmatrix}, \\quad \\beta = \\begin{bmatrix}\n", "\\beta_1 \\\\\n", "\\beta_2 \\\\\n", "\\vdots \\\\\n", "\\beta_M\n", "\\end{bmatrix}, \\quad \\epsilon = \\begin{bmatrix}\n", "\\epsilon_1 \\\\\n", "\\epsilon_2 \\\\\n", "\\vdots \\\\\n", "\\epsilon_n\n", "\\end{bmatrix},\n", "$$\n", "\n", "$X$ is an n$\\times$M matrix: this is our **design matrix**, $\\beta$ is an M-dimensional vector (an M$\\times$1 matrix), and $Y$ is an n-dimensional vector (an n$\\times$1 matrix). In addition, we know that $\\epsilon$ is an n-dimensional vector (an n$\\times$1 matrix)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2400, 10)\n", "(2400,)\n" ] } ], "source": [ "X = train_df[cols_of_interest]\n", "y = train_df['price']\n", "print(X.shape)\n", "print(y.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scaling our Design Matrix\n", "\n", "### Warm-Up Exercise\n", "\n", "Warm-Up Exercise: for which of the following do the units of the predictors matter (e.g., trip length in minutes vs seconds; temperature in F or C)? A similar question would be: for which of these models do the magnitudes of values taken by different predictors matter? \n", "\n", "(We will go over Ridge and Lasso Regression in greater detail later)\n", "\n", "- k-NN (Nearest Neighbors regression)\n", "- Linear regression\n", "- Lasso regression\n", "- Ridge regression\n", "\n", "**Solutions**\n", "\n", "- kNN: **yes**. Scaling affects distance metric, which determines what \"neighbor\" means\n", "- Linear regression: **no**. Multiply predictor by $c$ -> divide coef by $c$.\n", "- Lasso: **yes**: If we divided coef by $c$, then corresponding penalty term is also divided by $c$.\n", "- Ridge: **yes**: Same as Lasso, except penalty divided by $c^2$.\n", "\n", "### Standard Scaler (Standardization)\n", " \n", "[Here's](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) the scikit-learn implementation of the standard scaler. What is it doing though? Hint: you may have seen this in STAT 110 or another statistics course multiple times.\n", "\n", "$$\n", "z = \\frac{x-\\mu}{\\sigma}\n", "$$\n", "\n", "In the above setup: \n", "\n", "- $z$ is the standardized variable\n", "- $x$ is the variable before standardization\n", "- $\\mu$ is the mean of the variable before standardization\n", "- $\\sigma$ is the standard deviation of the variable before standardization\n", "\n", "Let's see an example of how this works:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xz_manualz_sklearn
count4000.0000004.000000e+034.000000e+03
mean2096.645250-2.775558e-17-4.096723e-17
std957.7851411.000000e+001.000125e+00
min384.000000-1.788131e+00-1.788355e+00
25%1420.000000-7.064687e-01-7.065571e-01
50%1920.000000-1.844310e-01-1.844540e-01
75%2570.0000004.942181e-014.942799e-01
max13540.0000001.194773e+011.194922e+01
\n", "
" ], "text/plain": [ " x z_manual z_sklearn\n", "count 4000.000000 4.000000e+03 4.000000e+03\n", "mean 2096.645250 -2.775558e-17 -4.096723e-17\n", "std 957.785141 1.000000e+00 1.000125e+00\n", "min 384.000000 -1.788131e+00 -1.788355e+00\n", "25% 1420.000000 -7.064687e-01 -7.065571e-01\n", "50% 1920.000000 -1.844310e-01 -1.844540e-01\n", "75% 2570.000000 4.942181e-01 4.942799e-01\n", "max 13540.000000 1.194773e+01 1.194922e+01" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABWUAAAE/CAYAAAAuSaGoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3debgkZXX48e/sM5E7avAaUNwS5QRXDAIa3BIRowEnRpEEUVGB+DNGY1CjEVzjkihoNG5hESIuRBCRVRQx7rjihh5NgkaUREISGcjsM78/3rpDT0/3vX37dld31/1+noeH6a7q6vdU1a3z1umqt5bs2LEDSZIkSZIkSVI9lo66AZIkSZIkSZK0mFiUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJ1QEbEjIu7U9t6xEXFx9e/XRsQz5ljGKyNi3TDbOSwRsX9E/GtEfD0i7rnAZS2LiAsj4ocR8fxe18vMNoiIJ0bE2+eY9/SIOHQh7WxZ1qsj4u/7+NwrIuLfI+LMQbRjHt870P0sIl4cEWctcBkPiYgfV/9+bkS8bAHLumLmbzEiLo2I+y6kbZIUEfescsw/d5h2Vqc+QA1tOisiXlz9eyDf34T8MOrjftX3+/eI+EREHBgR7+nhM63b8pqIuMMs887Zx5GkYYqIh0bEVRHx7Yj4bkRcFhH3a5l+xSByUsvyLo6IYxe4jJ35aK7zwIi4S0R8cSHft1AR8e6IuC4iXh8Rx0XE83r4zI+rnPmQiDhvjnnnrE0M0yi/vzWPRsRnIuIpHebZuY9ExGkRcUCHeXbWetQsy0fdAA1HZr6yh9l+F7h22G0ZkicCV2XmcQNY1l2BxwG3y8xtEfEZ5rFeMvPjwMfnmGcQ7Vyo5wBHZ+bna/7esd7PMnPOE9g5PLZlWU9Y4LIkacZGICLiHpn5E8qL2wGHjLZZAzXx+WEMjvvPAP4qM8+pigj7zOfDmbn/HNPn7ONI0rBExCrgYuCwzPxG9d4xwGURca/M3EZLX3wczXUemJk/B367puZ08yfA3TPz+qqY/N1eP5iZXwN2KzS2zdNLbWJoRvn9fdQKHgu8d6iN0lixKNtQMwfTzHxLRLwGeBKwGbgJOBb4Q+AhwJsjYhvwaeCdwP7ADuAySid/a0Q8AfgbYBtwDXAo8HDg0ZRC3+2AXwKHA+8G7gPsCaynFAGzKnR+HXgocGfgH4C9gEdVn39qZn6nQxwnA38MbAV+CDwfeAzwPGBZRKzJzKe1fWa3eDPzhoh4MvA6YANwCfBXwB2By4EVwNcj4rTW9ZKZF/Swro+lJKK/AL4I3CUzN0fEMuDfq/X1buDvga8BVwKXAgdX3//SzLwgIn4FeE+1jv6X6kQ1M4/t8LX7RcRngV8Fvgk8LzPXR8Rdq++5exXThzPzDRFxLuVE7YyIeCXwhapN9wSWAGdn5purq44/B3y/mvYo4F6U7X87yj7wmszc7Ve6Hvez71H2sylgb8r+dFRmboyIjcCbgMOqaX+bme+OiBXA2ykJ6hfAf1L2NyLiocDfAquqz3wyM5/TJY4/AF5UfXbnvhYRrwbuVMV4UUtIewFbMvNuEXE4ZX9ZSdl/z87MkyPifdW8V1V/J58DnpKZX4uIE4AXVOvsP4HnZ+YPq7/Nm4EHAHcDvg08IzNvaV+nkha1bcC5wNOAN1Tv/SFwIXAiQEQsBd5KyRtTlOP5cZn5hdmONRGxA5jOzP+qlrMDmAb+u9vyOjUwIj4J/FNmnla9PgnYMzNf1DZfk/PDjyl9gD2A1wP/BtyfkoP/pNoW08D7gN+o4v8PSh/t1W3r6eHAqcAySl/sjZl5fpXbzwLuAvyEsm+cDzwIOAi4V0TcCzgeuH1EvC8zn7XbBuugZdt/HDglM8+v3v+bapbvU/La4VVf7kuUHwbuDnwKOCEzt1d9oZdR+lifBl6YmZ5nSFqoXwHuQDnGzvgAJb8ti4jTq/dm+uIPonOf/dF0P0bfBTib246xd575ooh4NqVguZJy3vWmKv8cy67nwYfRPR99hnJ+tg14VUsc9wYuAE6m5IQ9qrxzT0reugfwM+CY6lz2QMr520rgX6vpf5GZn2ldYRHx/4DnUnLuxirOayPikVUbdwBfBh5POZ9/PyXfXxYR/0S5+OmxEbEhM9/Zcavs+n2PruI7BPgpsG9m/kc17Wrg1cBR3Fab6JbTlwFvrr7/l8DVwH0z89Ft33csVV5qfz1LHj1rgN+/F/CPlP4BwCWZeXI17eXAMym1ix9R+jtPam1vNd9y4IPAlmr+T1Xr8MGU/fADEfGMzLy6yzq/PfB3lD7eCkp94SVV7abXffbsqm3bKfWb/wOemZnf7/SdGh6HL5hsV0W57eyaiLgGeG37DBFxN+DPgQMz8yHAFcDB1QH2a5Q/3gsoB+ibKH/YD6EktBdHxJ6UA/Ux1dUUV1GuLJ1xP+DRmfk7lAP7/2bmwzJzX+CrlCLqjHtm5iHAMZQTpc9Ubboc+LMObX9WtcwDM/OBlF/szsrMD1CKl+d2KMh2jDci9gbOpBwQDwA2Acsycz3wBGBDZu7fYb30LDN/SDmpfGL11mHAdR0ObL8OfCIzD6KcwLytev9kyg8lv0kp5D54lq+7N/BkyvZaApxUvf9+4MwqxoOAQyPiqZl5FPBz4GmZeS6lM3NVZj6AkkCPiYg/qpaxD/C6ahtupJxIPj0zfwtYB7w7Iu7e2ph57GfHUzpHD61iuBfw+9ViVgH/lZm/TTnBfWtErKYU4PcF7kvp6LR+9wuBV2bmwdX0J7bc7tEax69ROgSPzMwDKZ2UXWTmT6t9YH9KgtoIPC0illAKIM+sYnso8PKIuFPLSe/vZOZPW9bH7wIvrd5/ECXpfqxaFsABwO8B+1E6Xke2t0eSKJ3+p7e8fialODfjYErn/WGZeV9KB7v1dvv5HmvmWl67d1KO6zMF4udQ8vNOTc4PXdbfKZn5YErunCmmvx34XmbuR9kG3a6Ieg1wapXDn025khjKSfiXM/N+1br83apdL2pZh68DXgl8rteCbJvTgGdBGdaJ0lc7vcN8v0E5iX8gpY/2qCjDN/wNcGgV+82UE2JJWpDM/B9Kn/ryiPi3iHg/5Vj1qczc3NoXB66nS5+9mqfbMfqd3HaMfQHlXIyI2IOSm55QfeYoyjnsjNbz4Nny0UwsF7TkkpMpP9C9uEPYjwCOzMzfBG4FnlsV8T4KnFydF7+dcjHVLqrj99uA36ty2j8AD4+IlcB5wIurWD5PKeqSmY+YWYdVLvk48NZeCrJt8f2SUmQ+pmrLfpQfMT/RNmu3nH4cpd9yf+BhlHwzX93y6CC//3jg36pz40cA94mI20fEEylF2Idl5v2B69i1FjJjJfARSvH+mMzcOjMhM1/BbefsHQuylbcCX6/ifDClQPwX89xnofww/WdVe69m9j6fhsSi7GT7nZkDe3Vw73RZ/s+AbwHfiIi3ANdk5sc6zPd44O8zc0dmbqKcVD0eeCRwbWZ+CyAzz6Z0tmd8OzNvrqadB5wVEX8WEX9H6bS3/qr50er//1r9//KW17/apU3vy8xbq9d/BzymSirddIv3kKqtM7dJzntM1h6dTjkYQ+kwnNZhni2UK2UBvsFtsT8BOCMzt1fr9OxZvuejmXljZu6gdCoeG+W21kcBr6uK9F+mdAh2Sdhx2+2v74SdCfQsyvqG8svel6p/P4zyC+LHqmVeSvnV8YFt7el1P/tL4MaIeCnlJPMu7LqPXNiyXlZRfsk7FPhg1fG6lVJQnvFM4A4R8VfAu4A1LctrjeMxwBVZ/WpL6aB0VHXcLgNenpmfrdbxEcABEfEqyq+vS6q2dfN7lB8NbgTIzLMoP2bcs5p+eWZuyswtlKuyOu3/kha5zPw6sC0iDqiKm1OZ+d2W6V+i/Cj3J9Wxd+aKzRnzOtb0sLx2FwG/FhEPogwDdF1mZts8jc0PHWb5SWZe09LO1vz+DwCZeQPlxLiTfwLeGREfoJwY/lX1/u9SFeOrH4A/2a2NC3Au8LDqCqDHAT/MzB91mO+iln7Kv1BifBxlHV5fzfOOIbRP0iKVmadSfkB7AXADJV98s7pasHW+ufrs3Y7Rh3LbMfZfKFf7k+UutsOB34+I1wGvYNe8tPM8mNnz0S6qOzneDRyRmf/ZYZbPtCz3m1U7H1C16bLq/1fRYYiBLMM5fAT4YpRnkPwvcAbl3G1TZn6qmu8D1bRBO52Sf6GcC5+Zmds7zNcppz8B+MfM3JiZm+nvFv5ueXSQ33858OSIuJRyRerLqvPpQ4GPVD8kkJl/kZmv7/D5UyiF+9dV+2w/Dqf01a6h3I18EPCAee6zUAq7M7m79W9CNbIo23DVQfBRlELhTZRfgv62w6xLKcW21tcrKCcuS9rmbT2w7rzlurpV4gzKpe8fBD7U9tlNbW3bMkfzZ247aG3T8g7taV1mt3g3tH1uru/u10coV+buV7XjIx3m2dySnHa0tKt9XW+b5Xtapy2lxLOs+vxvtxTqH8ptvwK3zt++Dme2N5SEPfOL3TLg+23F/4fS9ovnPPazDwEnUG4Neivl4N/alg3V8ma2+5K2/0NZTzM+S0mgP6BcKf6zlnlb45htGTtFGULiYsrVWh+q3rsdpUP0W1V7X0JZ3133Q3bfd2e+f2Ydb2h5v3UfkKR276dcdfL06t87RcTvU4bjgXKC8R46HFMr7ceaJdUydv7Q2cPydlGd/L2XcjXKs2m7Sraap7H5oYNu67un/J6Z76WceH+SUuj8dnX1TnsfZrereRcqM/+P0mc5mnIi3ekqWegc43z6L5LUs4g4JCJekpnrM/PizHwp5Wq/HbSNJdtDn73bMbo9P26tlrcPZSide1CuLD2JXbUPPdZLLtmXMvzMMdn9NvFejrPQPZccQylO/wvlyscPsXsegSGcD2fm54DlEXEQJZ90e8B0p5zeay5p3147+zGz5NGBfX9mfpVyN88/UC64+Up1J85WWs7/IuIO0fmB5O+nFOU7XbzVq2WUq6lnzs8PBp7fxz7rOekYsCjbcNXVK9+lFNbeSDnRObCavJXbikSfoPwhL4kyoPoJlIPZF4B9I+KB1fKeTBnXp9OvOo+jDC9wBpCUZLCQ29cuB55dJVgov45+trqSt6NZ4v0S5daC36pmPXaW721dL/OSmRuBD1N+bT2/Osnp1SXAsyJiaXXydzSd1zOU2zDvWN2icjxwWfWr15cpY9sS5WnKX6AMOdDaxvXVfH9azXd7yoNCOl1582XKentkNe/+lPFxWoewmM9+9jjgtVmGUICSQObaRy4DnhERq6ukelRLfAcCf5mZH6XcjnrvLsu7AjisSlTQYftX6/KfKFdxvbFl0n2AtcBJmXkR5QrwVS3fs43d95fLgT+KMo7gzFAcN1E6R5I0H+dQbnk/ivKDZ6vHUq5cfDflNvY/oLe8eyNlqCIouWYhyzudckv/AZTbFnfR8PzQq0soQztQDQv1JDrk9yhP335wdXfFCZT+1l7V559bzbMP5WqcTvruv1ROo1zhdAilaNCrT1CGS5rpG4zDw00lNcONwElRxgqdsTdwe24bA3ymLz5Xn72byynHXKoh2mZu7X5I9f1/TckVM2OYdlpex3zUqroT4TLKkDOfmaNN7b4PbIqI36uWdRCl+LhLLomIO0XET4GbMvNtlKLcgZRz8w0RcUQ13xMo44l3stBccjrljolvZ8vwbj24hDKk3qpquIZj6XwufCNw/2pdr6DlIWOz5NGBfX9EvIkyjMTHKEMlfY8y5MGngD+MiLXVrK+mOi9v8xXK8BX3jojjO0zvZf1/AnhRS+3m45ShEuazz2pMWJRtuCzDDvwT8LWI+BrlSpaZg8PHgTdGxDMpBc87U5LbdygH7tdn5n9THrT1jxHxDcpJ01bK1bDt3kK5jP7blIdofINyEtSvMygHt69ExPcpv3p2GsNtp27xVnEcCfxDFcdBsyymdb3047Rq+d2uMunmjZRx6r5DifsXdF7PUB4CdnE17/9SBiuHcnL90Ij4DmVcmA9Vt6e0explKIjvUBLDR9l1nEIAqtvvn0x5EMu3KL/sPT0zf9w2X6/72V8BF1Tf+17gn5l7H3kvpTjw3Wr+66rv/F/KOvtGRHyX8kvwFzotL8tD5F4KXFm1r9Mvpk+ljF94QER8M24bq/kXlHX9g2o/PIKy/me+5yPAP0fE/Vu+75OUwsOnI+J7lJPcw7vcviNJXWXmzygnYz+qclmr9wCPro6p36AMB3SvKOO7zuYFlNv7vkEZb/aGfpeXmb+gHKM/1OkOmCbnhygPh+nFi4DfrGI7n3I1cKf8/lLgtRHxTeAzlAdr/rj6/N2qz59FeYhoJ18Gfj0iPtpl+qyyGi4DOK/6kbnXz/2wauMnqnW4H937L5LUs+r48gfAG6KMKXstJac8K3PncDkfoeSA7czeZ+/mT4H7Vp85g3KlIZSi1vWU8+LvU4aFu7HL8jrmozavoZxv/3lLHrm0w3y7qe7ueDLw6ipHnEgZk/b/2ub7L0pB7sqI+DrlHPH46vN/CLy0+vwfUc47O7mMMo7ty3tpWwdnU4bPm++58FmU89dvUh6evZnOueQKyjr+AeWumK+1TOuWRwf5/W8D9q/6F1+jbOsPZ+allGEFv1Dl670owwfspsqxx1LOsdvHrv0ocE5EHDZLW19AGXLhO5QHuX6HMnbsfPZZjYklO3b0O4yFFoPql56TgFdn5v9VV5peAtxlAWOgjFyUceFuzMyxuUQ/yoO2bs7MS6sT4PMpY7S9e8RNkySpoyqffpXyoKz5XBGzaETE84BvZuaXqitaPge8amZswD6WdzGlcHrWAJvZt4i4F+WOm9dl5vaI+EPKVcoHj7hpktQYEfFm4C2Z+Z9Rxpn/FvDr1Q+R/SzvFuD+8yhaDlVVhLxzZp5Tvf47YGNm/uVi+H4tXstH3QCNt8y8OSI2A1+NiC2UsWeeOskF2V5FxEvofmXum7tcgboQ3wXeGxFvoIyNcxXz/4VRkqRaVLfdvYFSYLQg2921wDuq2wdXUh4E0ldBthcREZQHd3WSmbnbbbULdD3lwWzfiYitwC8pV0RLkgbnJ5QrYGfGyT2u34JsLyLiaZRxeTv5QGa+ecBf+T3gJVEe+LmMUnT+fwP+jnH+fi1SXikrSZIkSZIkSTVyTFlJkiRJkiRJqpFFWUmSJEmSJEmq0TiNKbsKOJDyBOBtI26LJGn8LAP2pjxUaNOI2zJpzLGSpNmYY/tjfpUkzWbW/DpORdkDKU+jlSRpNo8APj/qRkwYc6wkqRfm2Pkxv0qSetExv45TUfYGgP/5n1vZvr3zw8f23HMPbrrpllobVQfjmjxNja2pcUFzY2tqXLB7bEuXLuGOd7wdVPlC8zJnjm3XtH3LeMZf02IynvFmPLsyx/Zt3vkV3P/GnfGMN+MZf02LaSHxzJVfx6kouw1g+/Ydsya0+SS7SWJck6epsTU1LmhubE2NC7rG5u2B89dTjm3XtH3LeMZf02IynvFmPB2ZY+enr/w685kmMZ7xZjzjrWnxQPNiGkA8HfOrD/qSJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaLR91AxaDqbVrWL2qrOqNm7ay/uYNI26RJEmTrzW/gjlWkqRBMcdK0vBZlK3B6lXLOeLECwG46JR1rB9xeyRJaoLW/ArmWEmSBsUcK0nD5/AFkiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklQjH/Q1BO1PqpQkSYNhjpUkaTjMsZJUL4+4Q9DpSZWSJGnhWnOs+VWSpMExx0pSvRy+QJIkSZIkSZJqZFFWkiRJkiRJkmrk8AV9ah9vZ+Omray/ecMIWyRJUjO05ljzqyRJg+E5rCSNF4uyfWofN/b8Nx3O9PTUCFskSVIztOZY86skSYPhOawkjReLsgOycsUyB0WXJGnAWvMrmGMlSRoUc6wkjZZjykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNlo+6AZNiau0aVq9ydUmSNGjmWEmShsMcK0njq6ejc0S8Cnhq9fKSzHxpRLwPeDhwa/X+azLzgog4FDgVWAOcm5knDbrRo7B61XKOOPHCna8vOmXdCFsjSWoC82thjpUkDZo5tmjNseZXSRovcxZlqwR1GPBgYAdweUQ8CXgI8MjMvKFl3jXAmcCjgJ8Cl0TE4zPzsmE0XpKkSWV+lSRpOMyxkqRJ0MuVsjcAJ2bmZoCI+D5w9+q/MyPirsAFwGuAg4AfZeZ11bznAEcCJjRJknZlfpUkaTjMsZKksTdnUTYzvzfz74i4D+UWkEcAjwaeB/wSuBh4DnALJQHOuAHYZ3DNlSSpGcyvkiQNhzlWkjQJeh7xOyLuB1wCvCQzE3hSy7R3AM8AzqPcHjJjCbB9Pg3ac889Zp0+PT01n8WNpU4xNCGuTpoaFzQ3tqbGBc2NralxQbNjm1FXfoW5c2y7SVz/s7V5EuOZTdPigebFZDzjzXiab1zOYTuZxO1ljp1cxjPemhYPNC+mYcXT64O+DgHOB/48Mz8cEQ8A9s3M86tZlgBbgOuBvVs+uhfw8/k06KabbmH79h0dp01PT3Hjjevns7iBGeQGaI9hlHENU1PjgubG1tS4oLmxNTUu2D22pUuX9HXSM87qzK8we45tV+e+Ncwc2/odTfpbaVo80LyYjGe8Gc+uzLHDO4ftxBw73oxnvBnP+GtaTAuJZ6782suDvu4GfAw4KjM/Xb29BHhbRHyacrvHCcDZwNXlI3Fv4DrgaMqg6ZIkqYX5VZKk4TDHSpImQS9Xyr4YWA2cGhEz770HeCPwBWAFcH5mfgggIo6l/CK5GriUcjuIJEnalflVkqThMMdKksZeLw/6eiHwwi6T39Vh/iuBBy2wXZIkNZr5VZKk4TDHSpImwdJRN0CSJEmSJEmSFhOLspIkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklSj5aNuwGI3tXYNANPTUwBs3LSV9TdvGGWTJElqhKm1a1i9qnR1Nm/ZNuLWSJLUDK35FcyxktQvi7I127xl284C7IwjTrxw578vOmUd6+tulCRJDTBbjr3olHWjaJIkSY3QnmPbz2ElSfNnUbZmK1csM4FJkjQE5lhJkoajNceaXyVpMBxTVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqtHzUDdCuNm/ZxvT0FACbNm9j1cplO6dt3LSV9TdvGFXTJEmaWK35FXbNseZXSZL65zmsJPXHouyYWbliGUeceCEAF52ybue/Z16vH1XDJEmaYK35FXbNseZXSZL65zmsJPXH4QskSZIkSZIkqUYWZSVJkiRJkiSpRj0NXxARrwKeWr28JDNfGhGHAqcCa4BzM/Okat79gdOBtcBngedm5taBt1ySpAlnfpUkaTjMsZKkcTfnlbJV4joMeDCwP3BARPwxcCawDtgPODAiHl995Bzg+Zm5L7AEOH4YDZckaZKZXyVJGg5zrCRpEvQyfMENwImZuTkztwDfB/YFfpSZ11W/IJ4DHBkR9wDWZOaXq8+eBRw5hHZLkjTpzK+SJA2HOVaSNPbmHL4gM7838++IuA/lFpB3UBLdjBuAfYC7dHm/Z3vuuces06enp+azuMaZtPgnrb3z0dTYmhoXNDe2psYFzY6t7vwKc+fYdk1e/+0mMdZJbPNcmhaT8Yw342mucTuH7WQxba9JjHUS2zwb4xlvTYsHmhfTsOLpaUxZgIi4H3AJ8BJgK+WXxhlLgO2UK293dHi/ZzfddAvbt+/oOG16eoobb1w/n8UNzLjsUKOKvx+j3F7D1tTYmhoXNDe2psYFu8e2dOmSvk56xl1d+RVmz7Ht6ty3xiHHTtrfURP/9psWk/GMN+PZlTl2eOewnZhjx5vHh/FmPOOvaTEtJJ658msvwxcQEYcAVwIvy8yzgeuBvVtm2Qv4+SzvS5KkNuZXSZKGwxwrSRp3vTzo627Ax4CjM/PD1dtXl0lx74hYBhwNXJaZPwE2VgkQ4OnAZUNotyRJE838KknScJhjJUmToJfhC14MrAZOjYiZ994DHAucX027FDivmvY04LSIWAt8A3j7ANsrSVJTmF8lSRoOc6wkaez18qCvFwIv7DL5QR3m/xZw0ALbJUlSo5lfJUkaDnOsJGkS9DSmrCRJkiRJkiRpMHoZvmDRmlq7htWrXEWSJA2aOVaSpMEzv0rS5PBoPYvVq5ZzxIkXAnDRKetG3BpJkprDHCtJ0uC15lcwx0rSOHP4AkmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqoqvQXgAABriSURBVNHyUTdA/Zlau4bVq27bfBs3bWX9zRtG2CJJkprBHCtJ0nC05ljzq6TFzqLshFq9ajlHnHjhztcXnbKO9SNsjyRJTWGOlSRpOFpzrPlV0mJnUXaCbN6yjenpqVE3Q5KkRjG/SpI0HOZYSerOouwEWbli2S6/KkqSpIVrza9gjpUkaVDMsZLUnQ/6kiRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaLe91xohYC3wRODwzfxwR7wMeDtxazfKazLwgIg4FTgXWAOdm5kmDbrQkSU1hfpUkaTjMsZKkcdZTUTYiDgZOA/ZtefshwCMz84aW+dYAZwKPAn4KXBIRj8/MywbXZEmSmsH8KknScJhjJUnjrtcrZY8H/hR4P0BE/Apwd+DMiLgrcAHwGuAg4EeZeV013znAkYAJTZKk3ZlfJUkaDnOsJGms9VSUzczjACJi5q29gE8DzwN+CVwMPAe4Bbih5aM3APvMp0F77rnHrNOnp6fms7hFZRzXzTi2aVCaGltT44LmxtbUuKDZsUG9+RXmzrHtmr7+52Mc18U4tmmhmhaT8Yw342m2cTqH7cTtVYzrehjXdvXLeMZb0+KB5sU0rHh6HlO2VWb+G/CkmdcR8Q7gGcB5wI6WWZcA2+ez7JtuuoXt23d0nDY9PcWNN66fd3v7NWk7UZ3rphd1b686NTW2psYFzY2tqXHB7rEtXbqkr5OeSTLM/Aqz59h2w963zLEL08S//abFZDzjzXh2ZY4d3jlsJ8Pc/8yvC+fxYbwZz/hrWkwLiWeu/Lq0n4VGxAMi4sktby0BtgDXA3u3vL8X8PN+vkOSpMXG/CpJ0nCYYyVJ46avK2UpCextEfFpyu0eJwBnA1cDERH3Bq4DjqYMmi5JkuZmfpUkaTjMsZKksdLXlbKZ+W3gjcAXgGuBazLzQ5m5ETgWOL96/weU20EkSdIczK+SJA2HOVaSNG7mdaVsZt6z5d/vAt7VYZ4rgQctuGWSJC0S5ldJkobDHCtJGld9XSkrSZIkSZIkSeqPRVlJkiRJkiRJqpFFWUmSJEmSJEmq0bzGlG26qbVrWL3KVSJJ0qCZYyVJGg5zrCRNJo/cLVavWs4RJ1648/VFp6wbYWskSWoOc6wkScPRmmPNr5I0ORy+QJIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqtHzUDdBgbN6yjenpKQA2btrK+ps3jLhFkiQ1gzlWkqTBa82vYI6VtPhYlG2IlSuWccSJFwJw0SnrWD/i9kiS1BTmWEmSBq81v4I5VtLi4/AFkiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklQjH/TVQD7FUpKk4TDHSpI0HK051vwqaTGwKNtAPsVSkqThMMdKkjQcrTnW/CppMXD4AkmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmq0aJ/0NfU2jWsXrXoV4MkSQNlfpUkaTjMsZLUDIv+SL561fJdnvAoSZIWrjW/gjlWkqRBMcdKUjM4fIEkSZIkSZIk1ciirCRJkiRJkiTVyKKsJEmSJEmSJNXIoqwkSZIkSZIk1ciirCRJkiRJkiTVyKKsJEmSJEmSJNXIoqwkSZIkSZIk1ciirCRJkiRJkiTVyKKsJEmSJEmSJNXIoqwkSZIkSZIk1Wj5qBug4du8ZRvT01M7X2/ctJX1N28YYYskSWqG1hxrfpUkaTA8h5W0GFiUXQRWrljGESdeuPP1RaesY/0I2yNJUlO05ljzqyRJg+E5rKTFoKeibESsBb4IHJ6ZP46IQ4FTgTXAuZl5UjXf/sDpwFrgs8BzM3PrUFouSVIDmGMlSRoOc6wkaZzNOaZsRBwMfB7Yt3q9BjgTWAfsBxwYEY+vZj8HeH5m7gssAY4fRqMlSWoCc6wkScNhjpUkjbteHvR1PPCnwM+r1wcBP8rM66pfD88BjoyIewBrMvPL1XxnAUcOuL2SJDWJOVaSpOEwx0qSxtqcwxdk5nEAETHz1l2AG1pmuQHYZ5b352XPPfeYdXrrYN/qX13rscnbq6mxNTUuaG5sTY0Lmh0bjF+Obdf09T8Mda6zJm6fpsVkPOPNeJqtzhw73/wKbq9+mGP7ZzzjrWnxQPNiGlY8/Tzoaymwo+X1EmD7LO/Py0033cL27Ts6TpuenuLGGwc7vHfTdpReDXo9djKM7TUumhpbU+OC5sbW1Lhg99iWLl3S10nPhBlZjm230H3L/DpcTfzbb1pMxjPejGdX5tiF5dj55Fcwx/bLHNsf4xlvTYsHmhfTQuKZK7/2MnxBu+uBvVte70W5JaTb+5IkqTfmWEmShsMcK0kaK/0UZa8GIiLuHRHLgKOByzLzJ8DGiDikmu/pwGUDaqckSYuBOVaSpOEwx0qSxsq8i7KZuRE4FjgfuBb4AXBeNflpwFsj4gfAHsDbB9NMSZKazxwrSdJwmGMlSeOm5zFlM/OeLf++EnhQh3m+RXmqpSRJ6pE5VpKk4TDHSpLGVT/DF0iSJEmSJEmS+mRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSarR81A1Q/TZv2cb09BQAGzdtZf3NG3ZOm1q7htWrlnecJkmSumvNr7BrHm3Nr+3TJEnS7Ho9h+00XZLGlUXZRWjlimUcceKFAJz/psN3OYEEdk676JR1rK+9dZIkTabW/Aq759jWaeZYSZJ61+s5LJhjJU0Oi7KLXPsJ5EWnrBthayRJao7WHGt+lSRpMDyHldQUjikrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNVo+6gZofG3eso3p6amdrzdu2sr6mzeMsEWSJDWDOVaSpOFozbHmV0njzKKsulq5YhlHnHjhztcXnbKO9SNsjyRJTWGOlSRpOFpzrPlV0jhz+AJJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmq0fKFfDgirgLuDGyp3voT4DeAk4AVwNsy850LaqEkSYuQOVaSpMEzv0qSxkXfRdmIWALsC9wjM7dW790V+DBwALAJ+GJEXJWZ1w6isZIkLQbmWEmSBs/8KkkaJwu5Ujaq/18REXsCpwHrgU9n5n8DRMR5wFOA1y6olZIkLS7mWEmSBs/8KkkaGwsZU/aOwJXAk4DHAM8F7g7c0DLPDcA+C/gOSZIWI3OsJEmDZ36VJI2Nvq+UzcwvAV+aeR0RZwCnAn/dMtsSYPt8lrvnnnvMOn16emo+i9OAzXf9N3l7NTW2psYFzY2tqXFBs2ObzahybLvFuv5HxRzbvJiMZ7wZz+IzLvkV3F516mddN237GM94a1o80LyYhhXPQsaUfTiwKjOvrN5aAvwY2Ltltr2An89nuTfddAvbt+/oOG16eoobb1w//8bOomk7yrDNZ/0PY3uNi6bG1tS4oLmxNTUu2D22pUuX9HXSM4lGkWPbLXTfMr/O32LPsU2LyXjGm/HsarHk2HHIr2COrdt817XHh/FmPOOvaTEtJJ658utCxpS9A/DaiPhtylMqnwkcA5wTEdPArcCTgRMW8B0DN7V2DatXLSRsSZKGzhwrSdLgmV8lSWNjIcMXXBwRBwPfBJYB78zML0TEK4CrgJXA6Zn5lcE0dTBWr1rOESdeuPP1RaesG2FrJEnaXRNyrPlVkjRumpBfwRwrSU2xoJ/bMvNk4OS29z4IfHAhy5UkabEzx0qSNHjmV0nSuFg66gZIkiRJkiRJ0mJiUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmq0fNQN0GSaWruG1atu2302btrK+ps3jLBFkiQ1gzlWkqThaM2x5ldJo2ZRVn1ZvWo5R5x44c7XF52yjvUjbI8kSU1hjpUkaThac6z5VdKoOXyBJEmSJEmSJNXIK2XVs81btjE9PTXqZkiS1DjmWEmSBs/8KmmcWZRVz1auWLbLrR6t2pPdxk1ba22bJEmTrNcc6/h3kiT1rjW/wq45ttM5rDlWUp0symogZkt2kiSpf+0FW8e/kyRp4Tqdw5pjJdXJMWUlSZIkSZIkqUYWZSVJkiRJkiSpRg5foNpNrV3D6lVl13PcHkmSBqM1v4I5VpKkQTHHShoGi7Kq3epVyx0bT5KkAWvNr2COlSRpUMyxkobB4Qs0FJu3bANgenqKqbVrRtwaSZKaofVJ0eZYSZIGxxwrqW5eKauh8EnRkiQNnk+KliRpOMyxkurmlbKSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUo+WjboCab/OWbUxPT/U0beOmray/eUNdTZMkaaKZYyVJGo5ec6z5VVK/LMpq6FauWMYRJ1648/VFp6ybddr6WlsnSdLkas2jrfm1fdrMdHOsJEm96TXHml8l9cuirMaKV/VIkjQcXtUjSdLgeQ4rqV8WZTVWvKpHkqTh8KoeSZIGz3NYSf1aFEXZqbVrWL1qUYQqSVJtzK+SJA2HOVaSmm9RHOVXr1redSwYSZLUn9b8CuZYSZIGxXNYSWq+paNugCRJkiRJkiQtJoviSllNLh9KIknS4PlQEkmShsNzWEm9siirseZDSSRJGjwfSiJJ0nB4DiupVw5fIEmSJEmSJEk18kpZTQxvtZQkaTi81VKSpMHzHFbSbCzKamL0e6vl1No1rF51265uIpQkaVf93mppjpUkqbuFDBfUmmPNr1IzWZRV461etdxx8yRJGgJzrCRJw9GaY82vUjNZlFUjzOdKHW8hkSSpd/3mWPOrJEmz6/VqWM9hpWYaSlE2Io4GTgJWAG/LzHcO43u6aT95UDO1J6Zer9Rpv4Xk/DcdvnM5mzZvY9XKZTun1Z3s5jrx9RYWSeZYDVt7foX+cmxrfoVdc+wocthsOdZhGCSZX1WH2XJsv+ewMF451nNYqXcDP+pHxF2B1wMHAJuAL0bEVZl57aC/q5tOt9KpedrHvxvUcrqdeLZ3lKbWrhl4QpnrNtB+b2HxZFNqhnHLsebXZuo0/t2gltMth02tXQMw1KtsZ8uxCxmGwZNNafKNW34Fc2xTjSLH1nEXy2znqZ7DSt0N46e4Q4FPZ+Z/A0TEecBTgNfO8bllAEuXLpl1prmmz7jzHdd0fT2MaeO2nKZ8R7/Laf8Fst/vaF/Oc/76ip3/fvdfPqbrr5ObNm3llls27py2xx6rWVUllPZp7Wb7/tbp7dNm+/7Vq5bv0vYzTjqMW6u/pda2Aaxdu2bW9g3bfNbVfPV6/Jg04xhX+37V77Zsja3l38s6zrw4DDXHtus2/zjliXHORU39jvnk2G7TOuW3bjm2/S6W1uPJfI81vcbROm2274ddc2xrfoWSU4eV0/qx0Bw7jvlmISY1nk77PSwsHnNsvfm122dGfXwf1HKa8h11t3W2c7+FLmcmT/V7Dttperte4pjtHLb9O2Y7h23Nr720bdgWeu4zqfloNpMaU7e+Ur/xzJVfl+zYsaOvBXcTES8HbpeZJ1WvjwMOyswT5vjow4HPDbQxkqQmegTw+VE3YhTMsZKkIVuUOdb8Kkkaso75dRhXyi4FWiu9S4DtPXzuq5RG3gBsG0K7JEmTbRmwNyVfLFbmWEnSMCz2HGt+lSQNw6z5dRhF2espiWnGXsDPe/jcJhbhr7KSpHn511E3YMTMsZKkYVnMOdb8Kkkalq75dRhF2U8Br46IaeBW4MnAXLd9SJKkuZljJUkaPPOrJKl2Swe9wMz8GfAK4CrgGuCDmfmVQX+PJEmLjTlWkqTBM79KkkZh4A/6kiRJkiRJkiR1N/ArZSVJkiRJkiRJ3VmUlSRJkiRJkqQaWZSVJEmSJEmSpBpZlJUkSZIkSZKkGlmUlSRJkiRJkqQaLR91A3oREUcDJwErgLdl5jtH3KQ5RcSrgKdWLy/JzJdGxKHAqcAa4NzMPKmad3/gdGAt8FnguZm5NSLuDpwD3BlI4GmZeUvNoXQUEW8B7pSZx863/RFxB+ADwK8DNwJPzcz/GEkgLSLiCOBVwO2AKzLzhU3YZhFxDPDy6uVlmfniSd9mEbEW+CJweGb+eFDbadRxdojrBOAFwA7ga8CfZObmSY+r5f3nA0/JzEdXr+fV/ohYCZwBPATYABydmT+oK66mi4hDgLcCK4GbgGdn5k9G26r+TGI/Yjad+hijbM+gtPYtRt2WhejUnxhxkxakUz9ilO3pV699h0nRa59hlG1Ud03JsebXydCU/Arm2HFlju3f2F8pGxF3BV4PPBzYHzghIu472lbNrtoBDwMeTGnzARHxx8CZwDpgP+DAiHh89ZFzgOdn5r7AEuD46v13Ae/KzN+kbPiT64uiu4h4DPDMlrfm2/6/Bj6XmfsBpwF/V0vDZxERvw68B/gD4IHAb1XbZ6K3WUT8CvB24FHAg4BHVPvnxG6ziDgY+Dywb/V6DYPbTiOLs0Nc+wIvAX6bsk8uBf60mn1i42p5/77Ay9pmn2/7XwDcWr3/58BZw4hhEfsAcFxm7l/9++0jbk9fJrEfMZsufYwnjbZVC9ehbzGRZulPTKRZ+hETZZ59h7E3zz6DxtPE51jz62RoSn4Fc+y4MscuzNgXZYFDgU9n5n9n5q3AecBTRtymudwAnJiZmzNzC/B9ygb9UWZel5lbKUWVIyPiHsCazPxy9dmzqvdXAI+kxLvz/Rpj6CgifpWSfN9Qve6n/b9P6XwAfAh4fDX/KD2J8gvO9dU2Owr4PyZ/my2j/J3fjvIL9gpgC5O9zY6nHAR/Xr0+iMFtp1HG2R7XJuB5mXlzZu4AvgPcvQFxERGrgPcCr2x5r5/273w/Mz8LTFdX22qBqm10UmZ+u3rr28CkrttJ7EfMplMfY1K3DbB732LCdepPXD3iNi1Ep37EhpG2qD899R1G1bg+9NRnGFXjNLsG5Vjz65hrWH4Fc+y4MscuwCQMX3AXygFyxg2UjTy2MvN7M/+OiPtQboF4B7vHsQ+d49sHuBNwc7UTt74/au8FXgHcrXrdT/t3fqa65fpmYJqWos0I3BvYHBEfp/yBXQx8jwnfZpm5PiJOBn5AKTL/M7CZCd5mmXkcQETMvNVte0zUvtkeV3UL20+q96aB5wPHMuFxVd5I+fX0upb3+ml/t3Xx74ONYvHJzE2UDhQRsRR4NfCxUbZpASauHzGbLn2MQ0bXooFo71tMsk79ibG406kfXfoRXxxtq+ZvHn2HiTCPPoPGUINyrPl1/DUpv4I5diyZYxdmEq6UXUoZt2HGEmD7iNoyLxFxP+CTlEud/43OcXSLr/19GHHcEXEc8NPMvLLl7X7av6Tt/XHYpsspv/Y+B3gYcDBl/MpJ32YPBJ4N3INycNxGuS2nCdtsRq/bYyLjrG4NuxI4IzM/w4THFRGPBe6eme9rm9RP+yc2P4yTiDgyIq5v++9T1bSVlKuRlzO5V1k0cj9p7WNk5o9G3Z5+delbTLJO/YmJvW20Sz9iIse7a9PU40J7n0Ej1vAc29S/I/Pr+DLHToamHhuGkmMnoSh7PbB3y+u9GO0VlT2JMnj7lcDLMvNsusfR7f1fALePiGXV+3sz+riPAg6LiGuA1wJPBI5j/u3/WTUfEbEcmKIMcD9K/wF8KjNvzMwNwAWUA/6kb7PHAVdm5i+qX+XPAh5NM7bZjEH+bY1VnBHxm5RfS8/OzNdVb096XH8M3K86jpwOPCQizqW/9k9kfhg3mfmRzNyn7b9DI2IP4HJKB3hddZvYJGrcftKhjzHJdutbRMRbR9ymhejUn5jYK8fo3o+YdE08LnTqM2jEGp5jm/h3ZH4db+bYydDEY8PQcuwkFGU/BTwmIqargZCfTElgYysi7ka5BeXozPxw9fbVZVLcuyo6HE15ut5PgI1VAgB4evX+FuBzlIMpwDOAy2oLooPMfGxm3r8akP6VwMcz81nMv/2XVq+ppn9uDDoiFwOPi4g7VNvn8ZRxkSZ6mwHfAg6NiNtFxBLgCMptEU3YZjMG+bc1NnFGxBRwBWXMsVNm3p/0uDLz2Zm5X3UcOQ74WmYe1Wf7d74fEQ8HNmamQxcMzjnAvwBHVR3FSTVx/YjZdOljTKwufYsXjbpdC9CpP/H1EbdpITr1I7464jYNQse+w4jb1LdufQaNtSbkWPPrGGtgfgVz7KQwx87D2BdlM/NnlHFQrgKuAT6YmV8Zbavm9GJgNXBqRFxT/Tp1bPXf+cC1lHFDZh5o8zTgrRHxA2APbnv65vMoT7G8FngEcFJdAczTfNt/MvDQiPheNc/Inw6bmVcDf0t5yt61lDFD3s2Eb7PMvILyYKSvUx4isAJ4Ew3YZjMycyOD207jFOdxwK8BJ84cRyLitdW0SY5rNvNt/zuAVdX7b6cUqDUAEfFgyhNTDwG+Ue1/l464WX2Z0H7EbHbrY0TEc0fdKBVd+hPtw7VMjFn6ERNtjr7DJJqtz6Ax05Qca35V3cyxk8EcOz9LduxoH8JPkiRJkiRJkjQsY3+lrCRJkiRJkiQ1iUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmq0f8HmdXyZGoBQWwAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.preprocessing import StandardScaler\n", "\n", "x = house_df['sqft_living']\n", "mu = x.mean()\n", "sigma = x.std()\n", "z = (x-mu)/sigma\n", "\n", "# reshaping x to be a n by 1 matrix since that's how scikit learn likes data for standardization\n", "x_reshaped = np.array(x).reshape(-1,1)\n", "z_sklearn = StandardScaler().fit_transform(x_reshaped)\n", "\n", "# Plotting the histogram of the variable before standardization\n", "fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(24,5))\n", "ax = ax.ravel()\n", "\n", "ax[0].hist(x, bins=100)\n", "ax[0].set_title('Histogram of sqft_living before standardization')\n", "\n", "ax[1].hist(z, bins=100)\n", "ax[1].set_title('Manually standardizing sqft_living')\n", "\n", "ax[2].hist(z_sklearn, bins=100)\n", "ax[2].set_title('Standardizing sqft_living using scikit learn');\n", "\n", "# making things a dataframe to check if they work\n", "pd.DataFrame({'x': x, 'z_manual': z, 'z_sklearn': z_sklearn.flatten()}).describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Min-Max Scaler (Normalization)\n", "\n", "[Here's](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) the scikit-learn implementation of the standard scaler. What is it doing though? \n", "\n", "$$\n", "x_{new} = \\frac{x-x_{min}}{x_{max}-x_{min}}\n", "$$\n", "\n", "In the above setup: \n", "\n", "- $x_{new}$ is the normalized variable\n", "- $x$ is the variable before normalized\n", "- $x_{max}$ is the max value of the variable before normalization\n", "- $x_{min}$ is the min value of the variable before normalization\n", "\n", "Let's see an example of how this works:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xx_new_manualx_new_sklearn
count4000.0000004000.0000004000.000000
mean2096.6452500.1301800.130180
std957.7851410.0728020.072802
min384.0000000.0000000.000000
25%1420.0000000.0787470.078747
50%1920.0000000.1167530.116753
75%2570.0000000.1661600.166160
max13540.0000001.0000001.000000
\n", "
" ], "text/plain": [ " x x_new_manual x_new_sklearn\n", "count 4000.000000 4000.000000 4000.000000\n", "mean 2096.645250 0.130180 0.130180\n", "std 957.785141 0.072802 0.072802\n", "min 384.000000 0.000000 0.000000\n", "25% 1420.000000 0.078747 0.078747\n", "50% 1920.000000 0.116753 0.116753\n", "75% 2570.000000 0.166160 0.166160\n", "max 13540.000000 1.000000 1.000000" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABWUAAAE/CAYAAAAuSaGoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzdeZhkVX3/8ffsM4EZF2wD7hrkK26gbO4aRY3KOFFEEkBFBfWnRo2jRiO4xiXRwS2KhkVQXFAQcQQUBYy4gKIiCPrVKBrRMRJcGMjsM78/zu2hpqequrq76tbS79fz8DBV91bVOadu30/db917as62bduQJEmSJEmSJNVjbr8bIEmSJEmSJEmziUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFF2RESEdsi4g4T7js6Ir5Y/fstEfHsSZ7jDRGxopft7JWI2Dcifh4R34uIe8zwueZFxLkR8dOIeGmn4zL+HkTEUyPi/ZOse3JEHDyTdjY815si4t+n8bjXR8R/R8Sp3WjHIIqIe0TEzdW/XxQRr53Bc104/jcWEedHxH271U5Jaqfal22LiP9ssuy0Zp8BamjTaRHxqurftb/+TE01H/q9368+0/13RHw5Ig6IiA938JjG9+jKiLhtm3Un/ewiSf3WkIfPn3D/qyLitD60p/F4u+3xXUTcKSK+VV/rmrbhxIi4LiLeFhHHRMSLO3jMLyNi/+q/syZZd9KaQy/18/UbczQivhYRz2iyzvZtJCJOioj9mqyzfZvS6Jvf7waoPpn5hg5Weyxwba/b0iNPBS7JzGO68Fx3Bp4I7JKZWyLia0xhXDLzC8AXJlmnG+2cqecDR2TmN/rdkDpk5qQHsJN4fMNzPXmGzyVJU7UeiIi4e2b+inJjF+Dh/W3W8OskHwZgv/9s4J8z84yIOBq4y1QenJn7TrJ80s8ukjQgtgKrIuIbmZn9bsy4yY7vMvO3wMNqak4rLwTulpnXV0XsH3X6wMy8Atip0DhhnU5qDj3Tz9efRg3g8cBHetooDTyLsrPI+E43M98dEW8GngZsBG4EjgaeDuwPvCsitgAXAx8E9gW2ARdQDgY2R8STgX8FtgBXAgcDjwAeQyn07QL8GTgEOBG4N7AbsJZSBMyq0Pk94CHAHYH/AHYHHl09/pmZeXWTfhwP/D2wGfgp8FLgccCLgXkRsSQzj5zwmJ36m5lrIuJQ4K3AOuA84J+B2wFfAhYA34uIkxrHJTPP6WCsj6YE1iuBbwF3ysyNETEP+O9qvE4E/h24ArgIOB84qHr912TmORHxF8CHqzH6E1VhODOPbvKye0fE14HbAz8AXpyZayPiztXr3K3q06cz8+0RcSblgO6UiHgD8M2qTfcA5gCnZ+a7qrOOLwV+XC17NHBPyvu/C2UbeHNm7vRtXkSsB94JPAHYA/i3zDyxWrbT+5iZv6u2iz8A96nacygdbCcR8RDg34BF1Wt9JTMnfoP+JuAOVdtXNyzaHdiUmXeNiEMo28HC6vVOz8zjI+Kj1bqXVNv/pcAzMvOKiHgB8LJqLP6n6stPq7+5m4AHAHcFrgKenZk37/z2SdKktgBnAkcCb6/uezpwLrASICLmAu+h7DOXUvbnx2TmN9vtkyJiGzCWmf9bPc82YIyyP276fM0aGBFfAT6TmSdVt48DdsvMf5yw3jDmwy8p2b4r8DbgF8D9Kdn6wmqMx4CPAn9F+bzxO8pnrzdNeL1HACcA8yifsd6RmWdXmX0acCfgV5T3/GxgH+BA4J4RcU/gWOA2EfHRzHxus/dioob39AvAqsw8u7r/X6tVfkzJtUOqsf42peB/N+CrwAsyc2v1Gee1lM9OFwMvz0yPKSTVaR2wCvhkRDw0Mzc2LoyI29D6OHYDJTf3oeTpNyj744Mp+/c3AYdRsvK3wPLMvCUinkcpaC6kHG+9czy3Gl73a5Tjri3AGxsW7QmcAxxPyYRdq9y5ByWX7g78BjiqOkY9gJJzC4GfV8tfmZlfm/B6/w94EeUYdz0li66NiEcB76/6fhnwJMpx+scpOX5BRHyGclLT4yNiXWZ+cLJBj4jHVP17OPBrYK/M/F217PJq7A7n1ppD06yvjonfVb3+n4HLgftm5mMmvN7RVLk08XabHD2ti6+/O/AxyucDgPMy8/hq2euA51A+q/yMUk95WmN7q/XmA58ENlXrf7UawwdRsv4TEfHszLy8xZjfBngfZXtcQKkbvLralptuk9U4NdZkTq/atpVSl/k/4DmZ+eNmr6l6OX3B6Lmkujztyoi4EnjLxBUi4q7AK4ADMnN/4ELgoGpHfAXlj/wcyo78RsoOYH9KcL0qInaj7NCPqs66uIRyZum4+wGPycy/pgTAnzLzoZm5F/BdShF13D0y8+HAUZQDpq9VbfoS8A9N2v7c6jkPyMwHUr7ZOy0zP0EpXp7ZpCDbtL8RsQdwKmXHuR+wAZiXmWuBJwPrMnPfJuPSscz8KXANZYcPJRCua7IDvBfw5cw8kHKg897q/uMpX57ch/JB4UFtXm5PygHqAyhhe1x1/8eBU6s+HggcHBHPzMzDKR80jszMM4FPUM40fgAlaI+KiL+rnuMuwFur93A95YDzWZn5YGAFcGJE3K1JmxYB/5uZD6McyL4nIha3eh8bHvfHzLxvZn6gut3JdvJy4A2ZeRBwX+CpzS4HAcjMX1fv7b6UgFoPHBkRcyiFjedUz/8Q4HURcYeGg96/zsxfjz9XRDwWeE11/z6U0P189VwA+wF/A+xN+eB1WLM2SVKHPgY8q+H2c9hx/3kQ5UP+QzPzvpQP4o2X5U91nzTZ8030QUrBcLxA/HxKPk80VPnQYlxWZeaDKJk4XiR/P3BNZu5NGdtWZ0S9GTihyubnUa5UgnIQfllm3o/y2eWxVbv+kVs/i7wVeANwaacF2QlOAp4LZbomytid3GS9v6IcxD+Q8p48Osr0Df8KHFz1/SbKAbEk1e1twM3cuv9t1PQ4tlq2EFidmZHlzM9FwO+q47DTKfvDV1Dy4jbAiojYlZJtT672fYdTMqepzDynIUuOp3xB96omqz4SOCwz7wPcAryoKuJ9Dji+ysH3U4rLO6j23+8F/iYzD6B8MfmIiFgInAW8qmrrNyhFXTLzkdXD/7rKki8A7+mkIDuhf3+mFJmPqtqyN+VLzC9PWLVp1gPHUD6P3B94KCVvpqpVjnbz9Y8FflEd8z4SuHdE3CYinkopwj40M+8PXMeONY5xC4HPAr+n1E42jy/IzNdz67F404Js5T3A96p+PohSIH5lB9tkY00GyhfW/1C193Laf5ZTjSzKjp6/Hg+AKgSanb7/G+CHwPcj4t3AlZn5+SbrPQn498zclpkbKAdVTwIeBVybmT8EyMzTKR/Kx12VmTdVy84CTouIf4iI91E+3O/asO7nqv//vPr/lxpu375Fmz6ambdUt98HPK4Kn1Za9ffhVVvHpyWY8pysHTqZstOGchB0UpN1NlHOlAX4Prf2/cnAKZm5tRrT09u8zucy84bM3EY5QHx8dVnro4G3VkX6yyhnvOwQ7A2Xv34QtgftaZTxhvIN4Lerfz+U8k3j56vnPJ/y7eQDW7Tr3IZ+LaJ8YzfZ+3jpxL5V/2+3nTwHuG1E/DPwIWAJO25rO4ky/+EFwOsy8+vV2C0H9ouIN1K+fZ1TtbmVv6F8GXADQGaeRvmS4h7jbc3MDZm5Cbia5tu1JHUkM78HbImI/aovHZdm5o8aln+b8qXcC6vMGz+zc9yU9kkdPN9Eq4G/jIh9KNMAXZfZ8tLSocmHJqv8KjOvbGh/Y27/B0BmrqEcGDfzGeCDEfEJyoHhP1f3P5aqCF19sfuVdu2cpjOBh1ZnAD0R+Glm/qzJeqsbPn/8F6WPTwQuzMzrq3U+0ORxktRzmbmVUhR8bkQ8fsLiVsex4yZmydnV/38OXJ2Zv6me/zrg9lmucjsEeEpEvBV4PZPkCEB1pcaJlLNt/6fJKl8bP26mXOl4e0ohmcy8oPr/JTSZYiAzt1AKft+K8tsifwJOoRyTbcjMr1brfaJa1m0nU/IVyjHuqdWYTdQs658MfCwz12c5y3k6l/C3ytFuvv6XgEMj4nzKGamvrY6TDwY+m5l/BMjMV2bm25o8fhVlioK3VseZ03EI5TPYlZSrgw4EHtDBNnlVw7YFpbA7nt2Nn1vUZxZlZ6FqZ/loSqHwRso3Rs2+6ZtLKbY13l5AKdDNmbBu4w54+6XZ1SUVp1BOkf8k8KkJj90woW2bJmn++OUJjW2a36Q9jc/Zqr/rJjxusteers9Szszdu2rHZ5uss7EhxLY1tGviWG9p8zqNy+ZS+jOvevzDGgr1D2Hnb5TnsvMYjr/fUIJ9/Ju9ecCPJxT/H8LO34yOWwfQEERzmPx9nHh5fyfbydcpAfsTyhniv2nSp+2iTA3xRcr0BJ+q7tuF8oHowZSwejVlHFs+T5O+UK0/PnbrGu5vfG8labo+TjkQfVb17+0i4imU6XigHIh8mB33O+32SXOq59j+RWcHz7eD6iDxI5SzVp5H87Nkd2jLMORDq7ZXppzbmfkRyoH3VyiFzquqs3cmfjbZ2OThM5KZ/0f5LHIE5UC62Vmy0LyPU/lcIkk9leXqtRdSTlxp/LHJVsex49plyU45EhF3oUzZd3fKmafHTVynyWP2ohR7j8rWl4l3sp+F1llyFOWEkv+inPn4KXbOEejBcW5mXgrMj4gDKXnS6oejm2V9p1ky8XPK9s8nbXK0a6+fmd+lTNv3H5QTbr5TXWmzmYbtKyJuG81/aPzjlKJ8s5OyOjWPcjb1+HH3QcBLO9gmJ27jHpMOKIuys1B19sqPKIW1d1BOiT+gWryZWwPry5Q/+DkRsQh4AWWn901gr4h4YPV8hwK3ZefCFJQd5GmZeQqQlNCYyWVuXwKeVxXPoMzj+fXqG9Cm2vT325RLEB5crXp0m9dtHJcpycz1wKcpZ76cXR0Mdeo8yre/c6uDxCNoPs5QLse8XXUpy7HABdW3Y5dR5rYlyq8uf5My5UBjG9dW672kWu82lB8UaXaGzmWUcXtUte6+lHl07txk3Vam/D62U/XrAOCfMvNzlOkW9qTFtlaN0WcoZ02/o2HRvYFlwHGZuZpyZveihufZws7bwZeAv4syj+D4FBs3Uj4cSVIvnEG5NP5wyheejR5POcPxRMrl7n9LZ7l7A+USTyhZM5PnO5ly6f9+lMsbp2JQ86FT51GmbKCa7ulpNMntKL++/aDq6ooXUD5H7V49/kXVOnehnI3TzLQ/l1ROopzh9HBuPUOsE1+mTIM0nvmD8KOlkmax6srMCyhTDoxrdRw7XftTcvJfKFPhjc9x2ipLdq/a9OqcMA9sB34MbIiIv6me60BK8XGHLImIO0TEr4EbM/O9lKLcAZRj7nURsbxa78mU+cSbmWmWnEy5YuKqbJjerQPnUabKW1RN13A0zY9xbwDuX01vtICGHxlrk6Nde/2IeCdlGonPU6ZCuoYy5cFXgadHxLJq1TdRHW9P8B3K9BV7RsSxTZZ3Mv5fBv6xYVv+AmWqhCltkxpcFmVnoWragc8AV0TEFZQzWcZ3Il8A3hERz6EcCN2Rcnnj1ZQd/Nsy8w+UH+D4WER8n1J43Uw5G3aid1NOt7+KcpnI9ykHQ9N1CmUn+J2I+DHljMZmc71t16q/VT8OA/6j6seBbZ6mcVym46Tq+VudjdLKOyjz2V1N6ffvaT7OUH4E7IvVun+iTGoO5eD6IRFxNWX+mE9Vl7FMdCTlEtGrKQHyOXacxw+A6jL9Qyk/fPZDyjeAz8rMX06hX1N+H9vJzD9Rxur7EfEjyjfF36T1tvZM4CmUaQp+ELfOwfx7yhj+pGrXcsq4jj/PZ4H/jIj7N7z2VyiF/osj4hrKQe4hLS7fkaQZy8zfUA7aflZlWaMPA4+p9uXfp1yKec8o87u28zLKZYDfp8w3u2a6z5eZv6cUcD/VwRUwEw1kPkTEnTp8yX8E7lON19mUH+tqltuvAd4SET8Avkb5wcxfVo+/a/X40yg/DtrMZcC9IuJzLZa3NT4NBnBW9eVxp4/7adXGL1efqfam9ecSSarLyyj728bbOx3HzuD5LwSur57nx5Tp4G6gdZa8uXr9VzTkyPkt1t1BdXXiocCbqoxYSZmT9v8mrPe/lILcRRHxPcqx37HV458OvKZ6/N9RjiebuYAyj+3rOmlbE6dTpsWb6jHuaZTj0h9QfhR7I82z5ELgPylXunyd8tliXKsc7ebrvxfYt/r8cAVlOotPZ+b5lOkCv1nl9e6U6QN2UmXs0ZRj54lz134OOCMintCmrS+jTLlwNeUHWq+mzB071W1SA2rOtm3TndpCs1X1jdBxwJsy8/+qM03PA+40g7lS+i7K/HE3ZObAnMof5Ye2bsrM86sD4LMpc7mdOMlDJUnqiypPvws8aopnzgy9iHgx8IPM/HZ1RsulwBvH5wacxvN9kVI4Pa2LzZy2iLgn5Uqat2bm1oh4OuUs5IP63DRJGhkR8S7g3Zn5P1Hmj/8hcK/qi8bpPN/NwP2neBJNz1RFyDtm5hnV7fcB6zPzn2bD60uN5ve7ARo+mXlTRGwEvhsRmyhz1DxzmAuynYqIV9P6jJ13tTgDdSZ+BHwkIt5OmUPnEqb+TaQkSbWoLs97O6UQOasKspVrgQ9Ulw8upPwQyLQKsp2IiKD8cFczmZmHd/klrwfuBFwdEZuBP1OuQJIkdc+vKGfAjv+2xTHTLch2IiKOpPyWRjOfyMx3dfklrwFeHRGvoUwn9EPg/3X5NQb59aXtPFNWkiRJkiRJkmrknLKSJEmSJEmSVCOLspIkSZIkSZJUo0GaU3YRcADl13639LktkqTBMw/Yg/IDQhv63JZhY8ZKktoxY6fHfJUktdM2XwepKHsA5RdqJUlq55HAN/rdiCFjxkqSOmHGTo35KknqRNN8HaSi7BqAP/7xFrZubf7jY7vttis33nhzrY2qg/0aPqPat1HtF4xu30a1X7Bz3+bOncPtbrcLVHmhKZk0Yzs1ytvcTDk27Tk+rTk27Tk+7XVjfMzYaTNfa+L4tOf4tObYtOf4tFZHvg5SUXYLwNat29oG2kzDblDZr+Ezqn0b1X7B6PZtVPsFLfvm5YFT11HGdmqUt7mZcmzac3xac2zac3za6+L4mLFTY77WyPFpz/FpzbFpz/Fprdf56g99SZIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjeb3uwGzwdJlS1i8qAz1+g2bWXvTuj63SJKk4deYr2DGSpLULWasJPWeRdkaLF40n+UrzwVg9aoVrO1zeyRJGgWN+QpmrCRJ3WLGSlLvOX2BJEmSJEmSJNXIoqwkSZIkSZIk1ciirCRJkiRJkiTVyKKsJEmSJEmSJNXIH/rqgYm/VClJkrrDjJUkqTfMWEmql3vcHmj2S5WSJGnmGjPWfJUkqXvMWEmql9MXSJIkSZIkSVKNLMpKkiRJkiRJUo2cvmCaJs63s37DZtbetK6PLZIkaTQ0Zqz5KklSd3gMK0mDxaLsNE2cN/bsdx7C2NjSPrZIkqTR0Jix5qskSd3hMawkDRaLsl2ycME8J0WXJKnLGvMVzFhJkrrFjJWk/nJOWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSarR/H43YFgsXbaExYscLkmSus2MlSSpN8xYSRpcHe2dI+KNwDOrm+dl5msi4qPAI4BbqvvfnJnnRMTBwAnAEuDMzDyu243uh8WL5rN85bnbb69etaKPrZEkjQLztTBjJUndZsYWjRlrvkrSYJm0KFsF1BOABwHbgC9FxNOA/YFHZeaahnWXAKcCjwZ+DZwXEU/KzAt60XhJkoaV+SpJUm+YsZKkYdDJmbJrgJWZuREgIn4M3K3679SIuDNwDvBm4EDgZ5l5XbXuGcBhgIEmSdKOzFdJknrDjJUkDbxJi7KZec34vyPi3pRLQB4JPAZ4MfBn4IvA84GbKQE4bg1wl+41V5Kk0WC+SpLUG2asJGkYdDzjd0TcDzgPeHVmJvC0hmUfAJ4NnEW5PGTcHGDrVBq02267tl0+NrZ0Kk83kJr1YRT61cyo9gtGt2+j2i8Y3b6Nar9gtPs2rq58hckztlOD/L70u239fv1B5/i05ti05/i05/g0NyjHsJ0a9Pex3+3r9+sPOsenNcemPcentV6PTac/9PVw4GzgFZn56Yh4ALBXZp5drTIH2ARcD+zR8NDdgd9OpUE33ngzW7dua7psbGwpN9ywdipP1zXdfCMm9qGf/eqlUe0XjG7fRrVfMLp9G9V+wc59mzt3TtcOegZFnfkK7TO2U73Y5nqZsXUa5b/HbnB8WnNs2nN82uvG+JixvTuG7VSvtnMzdnZwfFpzbNpzfFqrI187+aGvuwKfBw7PzIuru+cA742IiymXe7wAOB24vDwk9gSuA46gTJouSZIamK+SJPWGGStJGgadnCn7KmAxcEJEjN/3YeAdwDeBBcDZmfkpgIg4mvKN5GLgfMrlIJIkaUfmqyRJvWHGSpIGXic/9PVy4OUtFn+oyfoXAfvMsF2SJI0081WSpN4wYyVJw2BuvxsgSZIkSZIkSbOJRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmq0fx+N2C2W7psCQBjY0sBWL9hM2tvWtfPJkmSNBKWLlvC4kXlo475KklSdzTmK5ixkjRdFmVrtnHTlu0F2HHLV567/d+rV61gbd2NkiRpBLTLWPNVkqTpm5ixHsNK0sxZlK3ZwgXzdgowSZI0c2asJEm90Zix5qskdYdzykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjeb3uwHa0cZNWxgbWwrAho1bWLRw3vZl6zdsZu1N6/rVNEmShlZjvsKOGWu+SpI0fR7DStL0WJQdMAsXzGP5ynMBWL1qxfZ/j99e26+GSZI0xBrzFXbMWPNVkqTp8xhWkqbH6QskSZIkSZIkqUYWZSVJkiRJkiSpRh1NXxARbwSeWd08LzNfExEHAycAS4AzM/O4at19gZOBZcDXgRdl5uaut1ySpCFnvkqS1BtmrCRp0E16pmwVXE8AHgTsC+wXEX8PnAqsAPYGDoiIJ1UPOQN4aWbuBcwBju1FwyVJGmbmqyRJvWHGSpKGQSfTF6wBVmbmxszcBPwY2Av4WWZeV32DeAZwWETcHViSmZdVjz0NOKwH7ZYkadiZr5Ik9YYZK0kaeJNOX5CZ14z/OyLuTbkE5AOUoBu3BrgLcKcW93dst912bbt8bGzpVJ5u5Axb/4etvVMxqn0b1X7B6PZtVPsFo923uvMVJs/YTo3i+9KtPo3i2HST49OaY9Oe49Oe47OjQTuG7dSovo9mbD0cn9Ycm/Ycn9Z6PTYdzSkLEBH3A84DXg1spnzTOG4OsJVy5u22Jvd37MYbb2br1m1Nl42NLeWGG9ZO5em6ZlA20n71fzr6+X712qj2bVT7BaPbt1HtF+zct7lz53TtoGeQ1JWv0D5jO9WLbW4QMrYbfRrlv8ducHxac2zac3za68b4mLG9O4btVK+2czN2dnB8WnNs2nN8WqsjXzuZvoCIeDhwEfDazDwduB7Yo2GV3YHftrlfkiRNYL5KktQbZqwkadB18kNfdwU+DxyRmZ+u7r68LIo9I2IecARwQWb+ClhfBSDAs4ALetBuSZKGmvkqSVJvmLGSpGHQyfQFrwIWAydExPh9HwaOBs6ulp0PnFUtOxI4KSKWAd8H3t/F9kqSNCrMV0mSesOMlSQNvE5+6OvlwMtbLN6nyfo/BA6cYbskSRpp5qskSb1hxkqShkFHc8pKkiRJkiRJkrqjk+kLZq2ly5aweJFDJElSt5mxkiR1n/kqScPDvXUbixfNZ/nKcwFYvWpFn1sjSdLoMGMlSeq+xnwFM1aSBpnTF0iSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo0sykqSJEmSJElSjSzKSpIkSZIkSVKNLMpKkiRJkiRJUo3m97sBmp6ly5aweNGtb9/6DZtZe9O6PrZIkqTRYMZKktQbjRlrvkqa7SzKDqnFi+azfOW522+vXrWCtX1sjyRJo8KMlSSpNxoz1nyVNNtZlB0iGzdtYWxsab+bIUnSSDFfJUnqDTNWklqzKDtEFi6Yt8O3ipIkaeYa8xXMWEmSusWMlaTW/KEvSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkaFvT3oAABqYSURBVEVZSZIkSZIkSarR/E5XjIhlwLeAQzLzlxHxUeARwC3VKm/OzHMi4mDgBGAJcGZmHtftRkuSNCrMV0mSesOMlSQNso6KshFxEHASsFfD3fsDj8rMNQ3rLQFOBR4N/Bo4LyKelJkXdK/JkiSNBvNVkqTeMGMlSYOu0zNljwVeAnwcICL+ArgbcGpE3Bk4B3gzcCDws8y8rlrvDOAwwECTJGln5qskSb1hxkqSBlpHRdnMPAYgIsbv2h24GHgx8Gfgi8DzgZuBNQ0PXQPcZSoN2m23XdsuHxtbOpWnm1UGcWwGsU3dMqp9G9V+wej2bVT7BaPdN6g3X2HyjO3UqL8vzXTa59k4NlPh+LTm2LTn+LTn+OxskI5hOzUb38ep9Hk2js9UOD6tOTbtOT6t9XpsOp5TtlFm/gJ42vjtiPgA8GzgLGBbw6pzgK1Tee4bb7yZrVu3NV02NraUG25YO+X2TtewbZh1jk0n6n6/6jSqfRvVfsHo9m1U+wU7923u3DldO+gZVL3MV2ifsZ3q1jY3ihk7yn+P3eD4tObYtOf4tNeN8TFje3cM2ynztT33A+05Pq05Nu05Pq3Vka9zp/OkEfGAiDi04a45wCbgemCPhvt3B347ndeQJGm2MV8lSeoNM1aSNGimdaYsJcDeGxEXUy73eAFwOnA5EBGxJ3AdcARl0nRJkjQ581WSpN4wYyVJA2VaZ8pm5lXAO4BvAtcCV2bmpzJzPXA0cHZ1/08ol4NIkqRJmK+SJPWGGStJGjRTOlM2M+/R8O8PAR9qss5FwD4zbpkkSbOE+SpJUm+YsZKkQTWtM2UlSZIkSZIkSdNjUVaSJEmSJEmSamRRVpIkSZIkSZJqNKU5ZUfd0mVLWLzIIZEkqdvMWEmSesOMlaTh5J67weJF81m+8tztt1evWtHH1kiSNDrMWEmSeqMxY81XSRoeTl8gSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWa3+8GqDs2btrC2NhSANZv2Mzam9b1uUWSJI0GM1aSpO5rzFcwYyXNPhZlR8TCBfNYvvJcAFavWsHaPrdHkqRRYcZKktR9jfkKZqyk2cfpCyRJkiRJkiSpRhZlJUmSJEmSJKlGFmUlSZIkSZIkqUYWZSVJkiRJkiSpRv7Q1wjyVywlSeoNM1aSpN5ozFjzVdJsYFF2BPkrlpIk9YYZK0lSbzRmrPkqaTZw+gJJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqtGs/6GvpcuWsHjRrB8GSZK6ynyVJKk3zFhJGg2zfk++eNH8HX7hUZIkzVxjvoIZK0lSt5ixkjQanL5AkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmo0v98NUO9t3LSFsbGl22+v37CZtTet62OLJEkaDY0Zu3HTlj63RpKk0TDxGNaMlTSKLMrOAgsXzGP5ynO33169agVr+9geSZJGRWPGrl61os+tkSRpNDQ7hpWkUdNRUTYilgHfAg7JzF9GxMHACcAS4MzMPK5ab1/gZGAZ8HXgRZm5uSctlyRpBJixkiT1hhkrSRpkk84pGxEHAd8A9qpuLwFOBVYAewMHRMSTqtXPAF6amXsBc4Bje9FoSZJGgRkrSVJvmLGSpEHXyQ99HQu8BPhtdftA4GeZeV317eEZwGERcXdgSWZeVq13GnBYl9srSdIoMWMlSeoNM1aSNNAmnb4gM48BiIjxu+4ErGlYZQ1wlzb3T8luu+3adnnjZN+avrrGcZTfr1Ht26j2C0a3b6PaLxjtvsHgZWynRv19mQnHpj3HpzXHpj3Hpz3HZ2d1Zqz5Wg/Hpz3HpzXHpj3Hp7Vej810fuhrLrCt4fYcYGub+6fkxhtvZuvWbU2XjY0t5YYbuvsTVbN14+v2ODbTi/drUIxq30a1XzC6fRvVfsHOfZs7d07XDnoGWN8ytlOdbnPmqyYa5f3VTDk27Tk+7XVjfMzYmWVsnfk6vu5s5H6gNfeTrTk27Tk+rdWRr51MXzDR9cAeDbd3p1wS0up+SZLUGTNWkqTeMGMlSQNlOkXZy4GIiD0jYh5wBHBBZv4KWB8RD6/WexZwQZfaKUnSbGDGSpLUG2asJGmgTLkom5nrgaOBs4FrgZ8AZ1WLjwTeExE/AXYF3t+dZkqSNPrMWEmSesOMlSQNmo7nlM3MezT8+yJgnybr/JDyq5aSJKlDZqwkSb1hxkqSBtV0pi+QJEmSJEmSJE2TRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSZIkSZIkSarR/H43QPXbuGkLY2NLAVi/YTNrb1q3fdnSZUtYvGh+02WSJKm1xnyFHXO0MV8nLpMkSe11egzbbLkkDSqLsrPQwgXzWL7yXADOfuchOxxAAtuXrV61grW1t06SpOHUmK+wc8Y2LjNjJUnqXKfHsGDGShoeFmVnuYkHkKtXrehjayRJGh2NGWu+SpLUHR7DShoVzikrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNbIoK0mSJEmSJEk1sigrSZIkSZIkSTWyKCtJkiRJkiRJNZrf7wZocG3ctIWxsaXbb6/fsJm1N63rY4skSRoNZqwkSb3RmLHmq6RBZlFWLS1cMI/lK8/dfnv1qhWs7WN7JEkaFWasJEm90Zix5qukQeb0BZIkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklSj+TN5cERcAtwR2FTd9ULgr4DjgAXAezPzgzNqoSRJs5AZK0lS95mvkqRBMe2ibETMAfYC7p6Zm6v77gx8GtgP2AB8KyIuycxru9FYSZJmAzNWkqTuM18lSYNkJmfKRvX/CyNiN+AkYC1wcWb+ASAizgKeAbxlRq2UJGl2MWMlSeo+81WSNDBmMqfs7YCLgKcBjwNeBNwNWNOwzhrgLjN4DUmSZiMzVpKk7jNfJUkDY9pnymbmt4Fvj9+OiFOAE4B/aVhtDrB1Ks+72267tl0+NrZ0Kk+nLpvq+I/y+zWqfRvVfsHo9m1U+wWj3bd2+pWxnZqt70uvzYZxnQ19nC7Hpj3Hpz3HpzPm6+w0W8Z1tvRzOhyb9hyf1no9NjOZU/YRwKLMvKi6aw7wS2CPhtV2B347lee98cab2bp1W9NlY2NLueGGtVNvbBtufFMzlfHvxfs1KEa1b6PaLxjdvo1qv2Dnvs2dO6drBz2Drh8Z26lOtznzdepG9W953Cjvr2bKsWnP8WmvG+MzWzJ2FPJ1fF11bjbsP9xPtubYtOf4tFZHvs5kTtnbAm+JiIdRfqXyOcBRwBkRMQbcAhwKvGAGr9F1S5ctYfGimXRbkqSeM2MlSeo+81WSNDBmMn3BFyPiIOAHwDzgg5n5zYh4PXAJsBA4OTO/052mdsfiRfNZvvLc7bdXr1rRx9ZIkrSzUchY81WSNGhGIV/BjJWkUTGjr9sy83jg+An3fRL45EyeV5Kk2c6MlSSp+8xXSdKgmNvvBkiSJEmSJEnSbGJRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSamRRVpIkSZIkSZJqZFFWkiRJkiRJkmpkUVaSJEmSJEmSajS/3w3QcFq6bAmLF926+azfsJm1N63rY4skSRoNZqwkSb3RmLHmq6R+syiraVm8aD7LV567/fbqVStY28f2SJI0KsxYSZJ6ozFjzVdJ/eb0BZIkSZIkSZJUI8+UVcc2btrC2NjSfjdDkqSRY8ZKktR95qukQWZRVh1buGDeDpd6NJoYdus3bK61bZIkDbNOM9b57yRJ6lxjvsKOGdvsGNaMlVQni7LqinZhJ0mSpm9iwdb57yRJmrlmx7BmrKQ6OaesJEmSJEmSJNXIoqwkSZIkSZIk1cjpC1S7pcuWsHhR2fSct0eSpO5ozFcwYyVJ6hYzVlIvWJRV7RYvmu/ceJIkdVljvoIZK0lSt5ixknrBoqx6YuOmLSxcMI+xsaV+iyhJUpf4S9GSJPWGGSupbhZl1RP+UrQkSd3nL0VLktQbZqykuvlDX5IkSZIkSZJUI4uykiRJkiRJklQji7KSJEmSJEmSVCOLspIkSZIkSZJUI4uykiRJkiRJklSj+f1ugEbfxk1bGBtb2tGy9Rs2s/amdXU1TZKkoWbGSpLUG51mrPkqabosyqrnFi6Yx/KV526/vXrVirbL1tbaOkmShldjjjbm68Rl48vNWEmSOtNpxpqvkqbLoqwGimf1SJLUG57VI0lS93kMK2m6LMpqoHhWjyRJveFZPZIkdZ/HsJKma1YUZZcuW8LiRbOiq5Ik1cZ8lSSpN8xYSRp9s2Ivv3jR/JZzwUiSpOlpzFcwYyVJ6haPYSVp9M3tdwMkSZIkSZIkaTaZFWfKanj5oySSJHWfP0oiSVJveAwrqVMWZTXQ/FESSZK6zx8lkSSpNzyGldQppy+QJEmSJEmSpBp5pqyGhpdaSpLUG15qKUlS93kMK6kdi7IaGtO91HLpsiUsXnTrpm4QSpK0o+leamnGSpLU2kymC2rMWPNVGk0WZTXyFi+a77x5kiT1gBkrSVJvNGas+SqNJouyGglTOVPHS0gkSercdDPWfJUkqb1Oz4b1GFYaTT0pykbEEcBxwALgvZn5wV68TisTDx40miYGU6dn6ky8hOTsdx6y/Xk2bNzCooXzti+rO+wmO/D1EhZJ/c7YiftejZ5m7/F0MrYxXwE29Llg2y5jnYZBkvmqOrTL2Okew8JgZazHsFLnul65jIg7A28D9gM2AN+KiEsy89puv1YrzS6l0+iZOP9dt56n1YHnxAO2pcuWdD1QJrsMdLqXsHiwKY2GQcjYbu17NbiazX/XredplWF15FS7jJ3JNAwebErDb9DyFczYUdXvjO1VTrU7TvUYVmqtF6eTHgxcnJl/AIiIs4BnAG+Z5HHzAObOndN2pcmWj7vj7Za0vN2LZYP2PKPyGtN9nonfQE73NSY+z/P/5cLt/z7xnx6347eTDWfZbtiwmZtvXr992a67LmZRFSgTl03U7vUbl09c1u71Fy+av0PbTznuCdxS/S01tg1g2bIlbdvXa1MZq6nqdP8xbAaxXxO3q+m+l419a/j3vKYrzw49zdhODVJODHIWjeprTCVjWy1rlm+tMnbiVSyN+5Op7ms67UfjsnavDztmbGO+TmzfZG2rY1/ey4zttUHMun6ZuN1v3LRlxuNjxg5evk68bd6N/ni0O/ab6fOM59R0j2GbLZ+ok360O4ad+BpTOYbtd8Z269inX8zYWzW+l3Xk65xt27bN6AUmiojXAbtk5nHV7WOAAzPzBZM89BHApV1tjCRpFD0S+Ea/G9EPZqwkqcdmZcaar5KkHmuar704U3Yu0FjpnQNs7eBx36U0cg2wpQftkiQNt3nAHpS8mK3MWElSL8z2jDVfJUm90DZfe1GUvZ4STON2B37bweM2MAu/lZUkTcnP+92APjNjJUm9Mpsz1nyVJPVKy3ztRVH2q8CbImIMuAU4FJjssg9JkjQ5M1aSpO4zXyVJtZvb7SfMzN8ArwcuAa4EPpmZ3+n260iSNNuYsZIkdZ/5Kknqh67/0JckSZIkSZIkqbWunykrSZIkSZIkSWrNoqwkSZIkSZIk1ciirCRJkiRJkiTVyKKsJEmSJEmSJNXIoqwkSZIkSZIk1Wh+vxvQiYg4AjgOWAC8NzM/2OcmTSoi3gg8s7p5Xma+JiIOBk4AlgBnZuZx1br7AicDy4CvAy/KzM0RcTfgDOCOQAJHZubNNXelqYh4N3CHzDx6qu2PiNsCnwDuBdwAPDMzf9eXjjSIiOXAG4FdgAsz8+Wj8J5FxFHA66qbF2Tmq4b9PYuIZcC3gEMy85fdep/63c8m/XoB8DJgG3AF8MLM3Djs/Wq4/6XAMzLzMdXtKbU/IhYCpwD7A+uAIzLzJ3X1a5RNlruttsHaG9onHYzPCuDNwBzgOuC5mfnH2hvaJ51+bouIpwD/npn3rLN9/dTBthPAR4DbAb8D/s5tZ4flD6aMz0Lg18BRmfmn2hvaJ63ytFo2q/fLw8SMbc+Mbc18bc+Mbc+Mba9fGTvwZ8pGxJ2BtwGPAPYFXhAR9+1vq9qrCkRPAB5EafN+EfH3wKnACmBv4ICIeFL1kDOAl2bmXpRwOba6/0PAhzLzPpSCzPH19aK1iHgc8JyGu6ba/n8BLs3MvYGTgPfV0vA2IuJewIeBvwUeCDy4en+G+j2LiL8A3g88GtgHeGS1fQ7texYRBwHfAPaqbi+he+9T3/rZpF97Aa8GHkbZJucCL6lWH9p+Ndx/X+C1E1afavtfBtxS3f8K4LRe9GG26TB3W22DI2+y8ak+0J0IPCUz9wGuAt7Uh6b2Raef2yLiL4F3U7afWaGDbWcO8AXgndW28wN23k+OrA63nfcBb6jGJ4FX1dvK/mmVpw1m7X55mJix7ZmxrZmv7Zmx7Zmx7fUzYwe+KAscDFycmX/IzFuAs4Bn9LlNk1kDrMzMjZm5Cfgx5c39WWZeV1XUzwAOi4i7A0sy87LqsadV9y8AHkXp7/b7a+xDUxFxe8of89ur29Np/1MoZ70BfAp4UrV+Pz2Ncobl9dV7djjwfwz/ezaP8ne+C+UbsQXAJob7PTuWUpz8bXX7QLr3PvWznxP7tQF4cWbelJnbgKuBu41Av4iIRZRvYd/QcN902r/9/sz8OjBWnW2rmWmbu622wdpb2T+TfS5ZALwkM39T3b4KmE3bZaef206mnOk0m0w2Ng+mfNH0per224GBvzqsizrZduZRzlIB+AvKVRKzxU55Os798lAxY9szY1szX9szY9szY9vrW8YOQ1H2TpQi57g1wF361JaOZOY1429YRNybMo3BVpr3o1X/7gDc1HBK9KD0+yPA64Hx0/yn0/7tj6mW3wSM9bbZk9oTmBcRX4iIK4EX07pvQ/OeZeZaytmGPwGuB34JbGSI37PMPCYzL224q5vvU9/6ObFfmfmrzPwKQESMAS8FzmXI+1V5B+Xs5l803Ded9g9dPgyJycZ1to972/5n5o2ZeQ5sP5P/tcDna21hf026fUTEy4DvA5cxu0w2NnsCv4uIUyLi+5SzwQZi2qqadLJveSVwUkSsAR5PucppVmiRp+Nm+355mJix7ZmxrZmv7Zmx7ZmxbfQzY4ehKDuXMp/iuDmUAufAi4j7AV+hXIL8C5r3o1X/Jt4Pfe53RBwD/DozL2q4ezrtn3gpxSC8p/Mp3x49H3gocBBl/sphf88eCDwPuDtlZ7KFMrXGKLxn4zp9P4ayn9WlJhcBp2Tm1xjyfkXE44G7ZeZHJyyaTvuHNh8G3GTjOtvHvaP+R8RtgPOAH2bm6TW1bRC0HZ+IuD9wKPDWmts1CCbbduYDjwFOzMwHUz47nlBb6/pvsm1nCWUe8YMzcw/KlDcfq7WFg2u275eHiRnbnhnbmvnanhnbnhk7fT3dLw9DUfZ6YI+G27vT5JTiQRMRD6cUUl5bBUWrfrS6//fAbSJiXnX/HvS/34cDT6jOJH0L8FTgGKbe/t9U6xER84GlwI09b317vwO+mpk3ZOY64BxKkXbY37MnAhdl5u8zcwPlVPvHMBrv2bhu/m0NVD8j4j6UycZPz8zxD1jD3q+/B+5X7UdOBvaPiDOZXvuHMh+GwGTjOtvHfdL+R8QewKWUyyqPqa9pA2Gy8TmsWn4FcD5wp4hodWbCqJlsbH5HmY7niur2pyhT9MwWk43P/YF1mfmd6vZHKJ9p5H55mJix7ZmxrZmv7Zmx7Zmx09fT/fIwFGW/CjwuIsaqHy06FPjSJI/pq4i4K+UyiiMy89PV3ZeXRbFnVXQ4ArggM38FrK+KuADPqu7fRAmbw6v7nw1cUFsnmsjMx2fm/TNzX8pckF/IzOcy9fafX92mWn5ptX4/fRF4YkTctnp/nkSZZ2Wo3zPgh8DBEbFLNbn5cuA/GY33bFw3/7YGpp8RsRS4EDguM1eN3z/s/crM52Xm3tV+5Bjgisw8fJrt335/RDwCWJ+Z/11TV0ZZ29xttQ3W38y+aTs+1X5oNfCZzHxFNSf0bDLZ9vPGzNyr2gc8GfhtZj6yT22t22Sfab9FmRt7n+r2cuB7NbexnyYbn/8C7hoRUd1eAXy35jYOJPfLQ8WMbc+Mbc18bc+Mbc+MnaZe75cHvihbTeL9euAS4Ergkw3V+0H1KmAxcEJEXFmdEXZ09d/ZwLWUOT7Hf9DmSOA9EfETYFfg/dX9L6b8Kt61wCOB4+rqwBRNtf3HAw+JiGuqdV5Cn2Xm5cC/UX5x71rgV5R5Zo5miN+zzLyQ8i3g9yjfJi8A3skIvGfjMnM93XufBqmfxwB/Cawc349ExFuqZcPcr3am2v4PAIuq+99PCUjNUKvcjYjzI2L/arVW2+DI62B8nkr5MYlnNPztntzHJteqw+1nVppsbKordZ5Gmc/tGuCxwMr+tbheHYzPHyl5/5mIuIoyPdNz+9bgAeB+efiYse2Zsa2Zr+2Zse2ZsVNX1355zrZts+nLJUmSJEmSJEnqr4E/U1aSJEmSJEmSRolFWUmSJEmSJEmqkUVZSZIkSZIkSaqRRVlJkiRJkiRJqpFFWUmSJEmSJEmqkUVZSdL/b8eOBQAAAAAG+VvPYldhBAAAAIykLAAAAADAKKl0aHo4yf4mAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.preprocessing import MinMaxScaler\n", "\n", "x = house_df['sqft_living']\n", "x_new = (x-x.min())/(x.max()-x.min())\n", "\n", "# reshaping x to be a n by 1 matrix since that's how scikit learn likes data for normalization\n", "x_reshaped = np.array(x).reshape(-1,1)\n", "x_new_sklearn = MinMaxScaler().fit_transform(x_reshaped)\n", "\n", "# Plotting the histogram of the variable before normalization\n", "fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(24,5))\n", "ax = ax.ravel()\n", "\n", "ax[0].hist(x, bins=100)\n", "ax[0].set_title('Histogram of sqft_living before normalization')\n", "\n", "ax[1].hist(x_new, bins=100)\n", "ax[1].set_title('Manually normalizing sqft_living')\n", "\n", "ax[2].hist(x_new_sklearn, bins=100)\n", "ax[2].set_title('Normalizing sqft_living using scikit learn');\n", "\n", "# making things a dataframe to check if they work\n", "pd.DataFrame({'x': x, 'x_new_manual': x_new, 'x_new_sklearn': x_new_sklearn.flatten()}).describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The million dollar question**\n", "\n", "Should I standardize or normalize my data? [This](https://medium.com/@rrfd/standardize-or-normalize-examples-in-python-e3f174b65dfc), [this](https://medium.com/@swethalakshmanan14/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff) and [this](https://stackoverflow.com/questions/32108179/linear-regression-normalization-vs-standardization) are useful resources that I highly recommend. But in a nutshell, what they say is the following: \n", "\n", "**Pros of Normalization**\n", "\n", "1. Normalization (which makes your data go from 0-1) is widely used in image processing and computer vision, where pixel intensities are non-negative and are typically scaled from a 0-255 scale to a 0-1 range for a lot of different algorithms. \n", "2. Normalization is also very useful in neural networks (which we will see later in the course) as it leads to the algorithms converging faster.\n", "3. Normalization is useful when your data does not have a discernible distribution and you are not making assumptions about your data's distribution.\n", "\n", "**Pros of Standardization**\n", "\n", "1. Standardization maintains outliers (do you see why?) whereas normalization makes outliers less obvious. In applications where outliers are useful, standardization should be done.\n", "2. Standardization is useful when you assume your data comes from a Gaussian distribution (or something that is approximately Gaussian). \n", "\n", "**Some General Advice**\n", "\n", "1. We learn parameters for standardization ($\\mu$ and $\\sigma$) and for normalization ($x_{min}$ and $x_{max}$). Make sure these parameters are learned on the training set i.e use the training set parameters even when normalizing/standardizing the test set. In sklearn terms, fit your scaler on the training set and use the scaler to transform your test set and validation set (**don't re-fit your scaler on test set data!**).\n", "2. The point of standardization and normalization is to make your variables take on a more manageable scale. You should ideally standardize or normalize all your variables at the same time. \n", "3. Standardization and normalization is not always needed and is not an automatic thing you have to do on any data science homework!! Do so sparingly and try to justify why this is needed.\n", "\n", "**Interpreting Coefficients**\n", "\n", "A great quote from [here](https://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia)\n", "\n", "> [Standardization] makes it so the intercept term is interpreted as the expected value of 𝑌𝑖 when the predictor values are set to their means. Otherwise, the intercept is interpreted as the expected value of 𝑌𝑖 when the predictors are set to 0, which may not be a realistic or interpretable situation (e.g. what if the predictors were height and weight?)\n", "\n", "### Standardizing our Design Matrix" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bedroomsbathroomssqft_livingsqft_lotfloorssqft_abovesqft_basementlatlong
count2.400000e+032.400000e+032.400000e+032.400000e+032.400000e+032.400000e+032.400000e+032.400000e+032.400000e+03
mean2.250977e-164.503342e-171.471971e-16-2.406640e-172.680263e-16-8.234154e-18-1.709281e-164.928733e-144.897231e-14
std1.000208e+001.000208e+001.000208e+001.000208e+001.000208e+001.000208e+001.000208e+001.000208e+001.000208e+00
min-3.618993e+00-2.677207e+00-1.766429e+00-3.364203e-01-8.897850e-01-1.636285e+00-6.704685e-01-2.937091e+00-2.084576e+00
25%-4.009185e-01-4.598398e-01-7.087691e-01-2.324570e-01-8.897850e-01-7.089826e-01-6.704685e-01-6.732889e-01-8.086270e-01
50%-4.009185e-011.736938e-01-1.933403e-01-1.774091e-01-8.897850e-01-2.895998e-01-6.704685e-018.468878e-02-1.278830e-01
75%6.717731e-014.904606e-014.973342e-01-1.033061e-019.975186e-015.375162e-016.315842e-018.607566e-016.455277e-01
max7.107923e+007.459330e+001.179553e+011.945618e+013.828474e+008.878574e+008.291994e+001.560846e+006.062967e+00
\n", "
" ], "text/plain": [ " bedrooms bathrooms sqft_living sqft_lot floors \\\n", "count 2.400000e+03 2.400000e+03 2.400000e+03 2.400000e+03 2.400000e+03 \n", "mean 2.250977e-16 4.503342e-17 1.471971e-16 -2.406640e-17 2.680263e-16 \n", "std 1.000208e+00 1.000208e+00 1.000208e+00 1.000208e+00 1.000208e+00 \n", "min -3.618993e+00 -2.677207e+00 -1.766429e+00 -3.364203e-01 -8.897850e-01 \n", "25% -4.009185e-01 -4.598398e-01 -7.087691e-01 -2.324570e-01 -8.897850e-01 \n", "50% -4.009185e-01 1.736938e-01 -1.933403e-01 -1.774091e-01 -8.897850e-01 \n", "75% 6.717731e-01 4.904606e-01 4.973342e-01 -1.033061e-01 9.975186e-01 \n", "max 7.107923e+00 7.459330e+00 1.179553e+01 1.945618e+01 3.828474e+00 \n", "\n", " sqft_above sqft_basement lat long \n", "count 2.400000e+03 2.400000e+03 2.400000e+03 2.400000e+03 \n", "mean -8.234154e-18 -1.709281e-16 4.928733e-14 4.897231e-14 \n", "std 1.000208e+00 1.000208e+00 1.000208e+00 1.000208e+00 \n", "min -1.636285e+00 -6.704685e-01 -2.937091e+00 -2.084576e+00 \n", "25% -7.089826e-01 -6.704685e-01 -6.732889e-01 -8.086270e-01 \n", "50% -2.895998e-01 -6.704685e-01 8.468878e-02 -1.278830e-01 \n", "75% 5.375162e-01 6.315842e-01 8.607566e-01 6.455277e-01 \n", "max 8.878574e+00 8.291994e+00 1.560846e+00 6.062967e+00 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bedroomsbathroomssqft_livingsqft_lotfloorssqft_abovesqft_basementlatlong
count800.000000800.000000800.000000800.000000800.000000800.000000800.000000800.000000800.000000
mean0.0187720.0414440.0244010.0165060.0267370.044415-0.031370-0.0560590.016900
std0.9826830.9975940.9890791.0740790.9916450.9938070.9996381.0080101.028649
min-2.546302-1.726907-1.626232-0.328715-0.889785-1.477851-0.670469-2.693960-2.141602
25%-0.400918-0.459840-0.677843-0.234254-0.889785-0.685684-0.670469-0.737509-0.815755
50%-0.4009180.173694-0.172723-0.1775210.053867-0.266301-0.6704690.031504-0.088678
75%0.6717730.4904610.487026-0.1135330.9975190.5957640.5502060.8173400.597412
max8.1806144.9251957.32161121.7165932.8848225.1390785.8397951.5543336.369480
\n", "
" ], "text/plain": [ " bedrooms bathrooms sqft_living sqft_lot floors \\\n", "count 800.000000 800.000000 800.000000 800.000000 800.000000 \n", "mean 0.018772 0.041444 0.024401 0.016506 0.026737 \n", "std 0.982683 0.997594 0.989079 1.074079 0.991645 \n", "min -2.546302 -1.726907 -1.626232 -0.328715 -0.889785 \n", "25% -0.400918 -0.459840 -0.677843 -0.234254 -0.889785 \n", "50% -0.400918 0.173694 -0.172723 -0.177521 0.053867 \n", "75% 0.671773 0.490461 0.487026 -0.113533 0.997519 \n", "max 8.180614 4.925195 7.321611 21.716593 2.884822 \n", "\n", " sqft_above sqft_basement lat long \n", "count 800.000000 800.000000 800.000000 800.000000 \n", "mean 0.044415 -0.031370 -0.056059 0.016900 \n", "std 0.993807 0.999638 1.008010 1.028649 \n", "min -1.477851 -0.670469 -2.693960 -2.141602 \n", "25% -0.685684 -0.670469 -0.737509 -0.815755 \n", "50% -0.266301 -0.670469 0.031504 -0.088678 \n", "75% 0.595764 0.550206 0.817340 0.597412 \n", "max 5.139078 5.839795 1.554333 6.369480 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bedroomsbathroomssqft_livingsqft_lotfloorssqft_abovesqft_basementlatlong
count800.000000800.000000800.000000800.000000800.000000800.000000800.000000800.000000800.000000
mean0.010727-0.018346-0.0290800.0528080.006684-0.021866-0.020484-0.0056310.000897
std0.9654220.9631620.9463671.5696191.0125870.9558050.9387931.0228301.028155
min-3.618993-2.677207-1.657158-0.335005-0.889785-1.524449-0.670469-2.656332-1.778063
25%-0.400918-0.459840-0.701038-0.229160-0.889785-0.708983-0.670469-0.624084-0.789024
50%-0.4009180.173694-0.183032-0.174051-0.889785-0.295425-0.6704690.174054-0.174216
75%0.6717730.4904610.417443-0.1006830.9975190.4792690.5881820.8421240.604540
max4.9625403.6581285.78563336.7468093.8284745.0109334.0169211.5449266.412249
\n", "
" ], "text/plain": [ " bedrooms bathrooms sqft_living sqft_lot floors \\\n", "count 800.000000 800.000000 800.000000 800.000000 800.000000 \n", "mean 0.010727 -0.018346 -0.029080 0.052808 0.006684 \n", "std 0.965422 0.963162 0.946367 1.569619 1.012587 \n", "min -3.618993 -2.677207 -1.657158 -0.335005 -0.889785 \n", "25% -0.400918 -0.459840 -0.701038 -0.229160 -0.889785 \n", "50% -0.400918 0.173694 -0.183032 -0.174051 -0.889785 \n", "75% 0.671773 0.490461 0.417443 -0.100683 0.997519 \n", "max 4.962540 3.658128 5.785633 36.746809 3.828474 \n", "\n", " sqft_above sqft_basement lat long \n", "count 800.000000 800.000000 800.000000 800.000000 \n", "mean -0.021866 -0.020484 -0.005631 0.000897 \n", "std 0.955805 0.938793 1.022830 1.028155 \n", "min -1.524449 -0.670469 -2.656332 -1.778063 \n", "25% -0.708983 -0.670469 -0.624084 -0.789024 \n", "50% -0.295425 -0.670469 0.174054 -0.174216 \n", "75% 0.479269 0.588182 0.842124 0.604540 \n", "max 5.010933 4.016921 1.544926 6.412249 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'sqft_above', 'sqft_basement',\n", " 'lat', 'long']\n", "\n", "X_train = train_df[features]\n", "y_train = np.array(train_df['price']).reshape(-1,1)\n", "\n", "X_val = val_df[features]\n", "y_val = np.array(val_df['price']).reshape(-1,1)\n", "\n", "X_test = test_df[features]\n", "y_test = np.array(test_df['price']).reshape(-1,1)\n", "\n", "scaler = StandardScaler().fit(X_train)\n", "\n", "# This converts our matrices into numpy matrices\n", "X_train_t = scaler.transform(X_train)\n", "X_val_t = scaler.transform(X_val)\n", "X_test_t = scaler.transform(X_test)\n", "\n", "# Making the numpy matrices pandas dataframes\n", "X_train_df = pd.DataFrame(X_train_t, columns=features)\n", "X_val_df = pd.DataFrame(X_val_t, columns=features)\n", "X_test_df = pd.DataFrame(X_test_t, columns=features)\n", "\n", "display(X_train_df.describe())\n", "display(X_val_df.describe())\n", "display(X_test_df.describe())" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "scaler = StandardScaler().fit(y_train)\n", "y_train = scaler.transform(y_train)\n", "y_val = scaler.transform(y_val)\n", "y_test = scaler.transform(y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## One-Degree Polynomial Model" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: y R-squared: 0.586
Model: OLS Adj. R-squared: 0.584
Method: Least Squares F-statistic: 422.3
Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00
Time: 15:13:28 Log-Likelihood: -2348.5
No. Observations: 2400 AIC: 4715.
Df Residuals: 2391 BIC: 4767.
Df Model: 8
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const -5.145e-15 0.013 -3.91e-13 1.000 -0.026 0.026
bedrooms -0.1592 0.017 -9.505 0.000 -0.192 -0.126
bathrooms 0.0422 0.022 1.914 0.056 -0.001 0.085
sqft_living 0.4011 0.011 36.238 0.000 0.379 0.423
sqft_lot -0.0058 0.014 -0.420 0.675 -0.033 0.021
floors -0.0470 0.017 -2.690 0.007 -0.081 -0.013
sqft_above 0.3866 0.013 30.254 0.000 0.362 0.412
sqft_basement 0.1242 0.014 8.651 0.000 0.096 0.152
lat 0.2414 0.013 17.983 0.000 0.215 0.268
long -0.1388 0.014 -9.605 0.000 -0.167 -0.110
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 1646.401 Durbin-Watson: 2.009
Prob(Omnibus): 0.000 Jarque-Bera (JB): 52596.394
Skew: 2.797 Prob(JB): 0.00
Kurtosis: 25.241 Cond. No. 2.95e+19


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.63e-36. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: y R-squared: 0.586\n", "Model: OLS Adj. R-squared: 0.584\n", "Method: Least Squares F-statistic: 422.3\n", "Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00\n", "Time: 15:13:28 Log-Likelihood: -2348.5\n", "No. Observations: 2400 AIC: 4715.\n", "Df Residuals: 2391 BIC: 4767.\n", "Df Model: 8 \n", "Covariance Type: nonrobust \n", "=================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "---------------------------------------------------------------------------------\n", "const -5.145e-15 0.013 -3.91e-13 1.000 -0.026 0.026\n", "bedrooms -0.1592 0.017 -9.505 0.000 -0.192 -0.126\n", "bathrooms 0.0422 0.022 1.914 0.056 -0.001 0.085\n", "sqft_living 0.4011 0.011 36.238 0.000 0.379 0.423\n", "sqft_lot -0.0058 0.014 -0.420 0.675 -0.033 0.021\n", "floors -0.0470 0.017 -2.690 0.007 -0.081 -0.013\n", "sqft_above 0.3866 0.013 30.254 0.000 0.362 0.412\n", "sqft_basement 0.1242 0.014 8.651 0.000 0.096 0.152\n", "lat 0.2414 0.013 17.983 0.000 0.215 0.268\n", "long -0.1388 0.014 -9.605 0.000 -0.167 -0.110\n", "==============================================================================\n", "Omnibus: 1646.401 Durbin-Watson: 2.009\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 52596.394\n", "Skew: 2.797 Prob(JB): 0.00\n", "Kurtosis: 25.241 Cond. No. 2.95e+19\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The smallest eigenvalue is 9.63e-36. This might indicate that there are\n", "strong multicollinearity problems or that the design matrix is singular.\n", "\"\"\"" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statsmodels.api as sm\n", "from statsmodels.regression.linear_model import OLS\n", "\n", "model_1 = OLS(np.array(y_train).reshape(-1,1), sm.add_constant(X_train_df)).fit()\n", "model_1.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Two-Degree Polynomial Model" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2400, 9) (2400, 18)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bedroomsbathroomssqft_livingsqft_lotfloorssqft_abovesqft_basementlatlongbedrooms^2bathrooms^2sqft_living^2sqft_lot^2floors^2sqft_above^2sqft_basement^2lat^2long^2
0-0.400918-0.459840-0.533523-0.184294-0.889785-0.243002-0.670469-0.261919-1.179294-0.462425-0.498149-0.435619-0.081332-0.820725-0.317640-0.429442-0.2634511.180094
1-0.4009181.1239940.919986-0.1297290.9975191.399581-0.6704690.5253650.289117-0.4624250.9626230.551247-0.0797730.8820971.104202-0.4294420.524670-0.289785
20.6717730.490461-0.049020-0.167446-0.889785-0.8604261.4996190.7207390.5457330.5331840.286055-0.174327-0.080898-0.820725-0.6252130.9657460.720531-0.546402
3-0.4009180.490461-0.121180-0.035583-0.8897850.222979-0.6704690.066599-0.088678-0.4624250.286055-0.217531-0.076044-0.820725-0.003426-0.4294420.0651970.088151
4-0.4009180.490461-0.327352-0.187215-0.889785-0.4526930.154165-1.4117290.232092-0.4624250.286055-0.332701-0.081403-0.820725-0.436000-0.227977-1.411246-0.232748
\n", "
" ], "text/plain": [ " bedrooms bathrooms sqft_living sqft_lot floors sqft_above \\\n", "0 -0.400918 -0.459840 -0.533523 -0.184294 -0.889785 -0.243002 \n", "1 -0.400918 1.123994 0.919986 -0.129729 0.997519 1.399581 \n", "2 0.671773 0.490461 -0.049020 -0.167446 -0.889785 -0.860426 \n", "3 -0.400918 0.490461 -0.121180 -0.035583 -0.889785 0.222979 \n", "4 -0.400918 0.490461 -0.327352 -0.187215 -0.889785 -0.452693 \n", "\n", " sqft_basement lat long bedrooms^2 bathrooms^2 sqft_living^2 \\\n", "0 -0.670469 -0.261919 -1.179294 -0.462425 -0.498149 -0.435619 \n", "1 -0.670469 0.525365 0.289117 -0.462425 0.962623 0.551247 \n", "2 1.499619 0.720739 0.545733 0.533184 0.286055 -0.174327 \n", "3 -0.670469 0.066599 -0.088678 -0.462425 0.286055 -0.217531 \n", "4 0.154165 -1.411729 0.232092 -0.462425 0.286055 -0.332701 \n", "\n", " sqft_lot^2 floors^2 sqft_above^2 sqft_basement^2 lat^2 long^2 \n", "0 -0.081332 -0.820725 -0.317640 -0.429442 -0.263451 1.180094 \n", "1 -0.079773 0.882097 1.104202 -0.429442 0.524670 -0.289785 \n", "2 -0.080898 -0.820725 -0.625213 0.965746 0.720531 -0.546402 \n", "3 -0.076044 -0.820725 -0.003426 -0.429442 0.065197 0.088151 \n", "4 -0.081403 -0.820725 -0.436000 -0.227977 -1.411246 -0.232748 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def add_square_terms(df):\n", " df = df.copy()\n", " cols = df.columns.copy()\n", " for col in cols:\n", " df['{}^2'.format(col)] = df[col]**2\n", " return df\n", "\n", "X_train_df_2 = add_square_terms(X_train)\n", "X_val_df_2 = add_square_terms(X_val)\n", "\n", "# Standardizing our added coefficients\n", "cols = X_train_df_2.columns\n", "scaler = StandardScaler().fit(X_train_df_2)\n", "X_train_df_2 = pd.DataFrame(scaler.transform(X_train_df_2), columns=cols)\n", "X_val_df_2 = pd.DataFrame(scaler.transform(X_val_df_2), columns=cols)\n", "\n", "print(X_train_df.shape, X_train_df_2.shape)\n", "\n", "# Also check using the describe() function that the mean and standard deviations are the way we want them\n", "X_train_df_2.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: y R-squared: 0.612
Model: OLS Adj. R-squared: 0.609
Method: Least Squares F-statistic: 220.8
Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00
Time: 15:13:28 Log-Likelihood: -2269.9
No. Observations: 2400 AIC: 4576.
Df Residuals: 2382 BIC: 4680.
Df Model: 17
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const -6.175e-12 0.013 -4.84e-10 1.000 -0.025 0.025
bedrooms -0.1271 0.058 -2.186 0.029 -0.241 -0.013
bathrooms 0.1537 0.060 2.569 0.010 0.036 0.271
sqft_living 0.3406 0.026 12.895 0.000 0.289 0.392
sqft_lot -0.0278 0.030 -0.920 0.358 -0.087 0.032
floors -0.1006 0.087 -1.151 0.250 -0.272 0.071
sqft_above 0.2460 0.036 6.809 0.000 0.175 0.317
sqft_basement 0.2587 0.033 7.758 0.000 0.193 0.324
lat 83.5852 8.613 9.705 0.000 66.696 100.474
long -7.0103 16.124 -0.435 0.664 -38.628 24.608
bedrooms^2 -0.0117 0.057 -0.207 0.836 -0.123 0.099
bathrooms^2 -0.1395 0.061 -2.293 0.022 -0.259 -0.020
sqft_living^2 0.2606 0.104 2.498 0.013 0.056 0.465
sqft_lot^2 0.0395 0.029 1.366 0.172 -0.017 0.096
floors^2 0.0449 0.083 0.539 0.590 -0.118 0.208
sqft_above^2 0.0384 0.105 0.366 0.714 -0.167 0.244
sqft_basement^2 -0.2640 0.049 -5.424 0.000 -0.359 -0.169
lat^2 -83.3483 8.612 -9.678 0.000 -100.237 -66.460
long^2 -6.8786 16.124 -0.427 0.670 -38.498 24.741
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 1594.128 Durbin-Watson: 2.011
Prob(Omnibus): 0.000 Jarque-Bera (JB): 41401.592
Skew: 2.739 Prob(JB): 0.00
Kurtosis: 22.596 Cond. No. 2.09e+16


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 3.68e-29. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: y R-squared: 0.612\n", "Model: OLS Adj. R-squared: 0.609\n", "Method: Least Squares F-statistic: 220.8\n", "Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00\n", "Time: 15:13:28 Log-Likelihood: -2269.9\n", "No. Observations: 2400 AIC: 4576.\n", "Df Residuals: 2382 BIC: 4680.\n", "Df Model: 17 \n", "Covariance Type: nonrobust \n", "===================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "-----------------------------------------------------------------------------------\n", "const -6.175e-12 0.013 -4.84e-10 1.000 -0.025 0.025\n", "bedrooms -0.1271 0.058 -2.186 0.029 -0.241 -0.013\n", "bathrooms 0.1537 0.060 2.569 0.010 0.036 0.271\n", "sqft_living 0.3406 0.026 12.895 0.000 0.289 0.392\n", "sqft_lot -0.0278 0.030 -0.920 0.358 -0.087 0.032\n", "floors -0.1006 0.087 -1.151 0.250 -0.272 0.071\n", "sqft_above 0.2460 0.036 6.809 0.000 0.175 0.317\n", "sqft_basement 0.2587 0.033 7.758 0.000 0.193 0.324\n", "lat 83.5852 8.613 9.705 0.000 66.696 100.474\n", "long -7.0103 16.124 -0.435 0.664 -38.628 24.608\n", "bedrooms^2 -0.0117 0.057 -0.207 0.836 -0.123 0.099\n", "bathrooms^2 -0.1395 0.061 -2.293 0.022 -0.259 -0.020\n", "sqft_living^2 0.2606 0.104 2.498 0.013 0.056 0.465\n", "sqft_lot^2 0.0395 0.029 1.366 0.172 -0.017 0.096\n", "floors^2 0.0449 0.083 0.539 0.590 -0.118 0.208\n", "sqft_above^2 0.0384 0.105 0.366 0.714 -0.167 0.244\n", "sqft_basement^2 -0.2640 0.049 -5.424 0.000 -0.359 -0.169\n", "lat^2 -83.3483 8.612 -9.678 0.000 -100.237 -66.460\n", "long^2 -6.8786 16.124 -0.427 0.670 -38.498 24.741\n", "==============================================================================\n", "Omnibus: 1594.128 Durbin-Watson: 2.011\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 41401.592\n", "Skew: 2.739 Prob(JB): 0.00\n", "Kurtosis: 22.596 Cond. No. 2.09e+16\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The smallest eigenvalue is 3.68e-29. This might indicate that there are\n", "strong multicollinearity problems or that the design matrix is singular.\n", "\"\"\"" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_2 = OLS(np.array(y_train).reshape(-1,1), sm.add_constant(X_train_df_2)).fit()\n", "model_2.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Three-Degree Polynomial Model" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2400, 9) (2400, 27)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bedroomsbathroomssqft_livingsqft_lotfloorssqft_abovesqft_basementlatlongbedrooms^2bedrooms^3bathrooms^2bathrooms^3sqft_living^2sqft_living^3sqft_lot^2sqft_lot^3floors^2floors^3sqft_above^2sqft_above^3sqft_basement^2sqft_basement^3lat^2lat^3long^2long^3
0-0.400918-0.459840-0.533523-0.184294-0.889785-0.243002-0.670469-0.261919-1.179294-0.462425-0.433878-0.498149-0.388130-0.435619-0.219568-0.081332-0.052800-0.820725-0.716812-0.317640-0.259891-0.429442-0.212933-0.263451-0.2649821.180094-1.180893
1-0.4009181.1239940.919986-0.1297290.9975191.399581-0.6704690.5253650.289117-0.462425-0.4338780.9626230.6096670.5512470.166351-0.079773-0.0527740.8820970.7087721.1042020.616150-0.429442-0.2129330.5246700.523971-0.2897850.290452
20.6717730.490461-0.049020-0.167446-0.889785-0.8604261.4996190.7207390.5457330.5331840.3425440.2860550.085193-0.174327-0.140462-0.080898-0.052793-0.820725-0.716812-0.625213-0.3670270.9657460.3415780.7205310.720318-0.5464020.547071
3-0.4009180.490461-0.121180-0.035583-0.8897850.222979-0.6704690.066599-0.088678-0.462425-0.4338780.2860550.085193-0.217531-0.154904-0.076044-0.052686-0.820725-0.716812-0.003426-0.113103-0.429442-0.2129330.0651970.0637930.088151-0.087625
4-0.4009180.490461-0.327352-0.187215-0.889785-0.4526930.154165-1.4117290.232092-0.462425-0.4338780.2860550.085193-0.332701-0.190853-0.081403-0.052801-0.820725-0.716812-0.436000-0.306038-0.227977-0.182506-1.411246-1.410757-0.2327480.233404
\n", "
" ], "text/plain": [ " bedrooms bathrooms sqft_living sqft_lot floors sqft_above \\\n", "0 -0.400918 -0.459840 -0.533523 -0.184294 -0.889785 -0.243002 \n", "1 -0.400918 1.123994 0.919986 -0.129729 0.997519 1.399581 \n", "2 0.671773 0.490461 -0.049020 -0.167446 -0.889785 -0.860426 \n", "3 -0.400918 0.490461 -0.121180 -0.035583 -0.889785 0.222979 \n", "4 -0.400918 0.490461 -0.327352 -0.187215 -0.889785 -0.452693 \n", "\n", " sqft_basement lat long bedrooms^2 bedrooms^3 bathrooms^2 \\\n", "0 -0.670469 -0.261919 -1.179294 -0.462425 -0.433878 -0.498149 \n", "1 -0.670469 0.525365 0.289117 -0.462425 -0.433878 0.962623 \n", "2 1.499619 0.720739 0.545733 0.533184 0.342544 0.286055 \n", "3 -0.670469 0.066599 -0.088678 -0.462425 -0.433878 0.286055 \n", "4 0.154165 -1.411729 0.232092 -0.462425 -0.433878 0.286055 \n", "\n", " bathrooms^3 sqft_living^2 sqft_living^3 sqft_lot^2 sqft_lot^3 \\\n", "0 -0.388130 -0.435619 -0.219568 -0.081332 -0.052800 \n", "1 0.609667 0.551247 0.166351 -0.079773 -0.052774 \n", "2 0.085193 -0.174327 -0.140462 -0.080898 -0.052793 \n", "3 0.085193 -0.217531 -0.154904 -0.076044 -0.052686 \n", "4 0.085193 -0.332701 -0.190853 -0.081403 -0.052801 \n", "\n", " floors^2 floors^3 sqft_above^2 sqft_above^3 sqft_basement^2 \\\n", "0 -0.820725 -0.716812 -0.317640 -0.259891 -0.429442 \n", "1 0.882097 0.708772 1.104202 0.616150 -0.429442 \n", "2 -0.820725 -0.716812 -0.625213 -0.367027 0.965746 \n", "3 -0.820725 -0.716812 -0.003426 -0.113103 -0.429442 \n", "4 -0.820725 -0.716812 -0.436000 -0.306038 -0.227977 \n", "\n", " sqft_basement^3 lat^2 lat^3 long^2 long^3 \n", "0 -0.212933 -0.263451 -0.264982 1.180094 -1.180893 \n", "1 -0.212933 0.524670 0.523971 -0.289785 0.290452 \n", "2 0.341578 0.720531 0.720318 -0.546402 0.547071 \n", "3 -0.212933 0.065197 0.063793 0.088151 -0.087625 \n", "4 -0.182506 -1.411246 -1.410757 -0.232748 0.233404 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# generalizing our function from above\n", "def add_square_and_cube_terms(df):\n", " df = df.copy()\n", " cols = df.columns.copy()\n", " for col in cols:\n", " df['{}^2'.format(col)] = df[col]**2\n", " df['{}^3'.format(col)] = df[col]**3\n", " return df\n", "\n", "X_train_df_3 = add_square_and_cube_terms(X_train)\n", "X_val_df_3 = add_square_and_cube_terms(X_val)\n", "\n", "# Standardizing our added coefficients\n", "cols = X_train_df_3.columns\n", "scaler = StandardScaler().fit(X_train_df_3)\n", "X_train_df_3 = pd.DataFrame(scaler.transform(X_train_df_3), columns=cols)\n", "X_val_df_3 = pd.DataFrame(scaler.transform(X_val_df_3), columns=cols)\n", "\n", "print(X_train_df.shape, X_train_df_3.shape)\n", "\n", "# Also check using the describe() function that the mean and standard deviations are the way we want them\n", "X_train_df_3.head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: y R-squared: 0.698
Model: OLS Adj. R-squared: 0.695
Method: Least Squares F-statistic: 211.2
Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00
Time: 15:13:28 Log-Likelihood: -1967.6
No. Observations: 2400 AIC: 3989.
Df Residuals: 2373 BIC: 4145.
Df Model: 26
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const 6.291e-09 0.011 5.58e-07 1.000 -0.022 0.022
bedrooms 0.2510 0.120 2.094 0.036 0.016 0.486
bathrooms -0.2267 0.120 -1.893 0.058 -0.461 0.008
sqft_living 0.2066 0.057 3.622 0.000 0.095 0.319
sqft_lot 0.0520 0.047 1.098 0.272 -0.041 0.145
floors 1.1766 0.624 1.886 0.059 -0.047 2.400
sqft_above 0.4473 0.078 5.715 0.000 0.294 0.601
sqft_basement -0.3982 0.060 -6.640 0.000 -0.516 -0.281
lat -6.976e+04 3592.874 -19.417 0.000 -7.68e+04 -6.27e+04
long 3.138e+04 8572.131 3.661 0.000 1.46e+04 4.82e+04
bedrooms^2 -0.5999 0.218 -2.757 0.006 -1.027 -0.173
bedrooms^3 0.2824 0.111 2.535 0.011 0.064 0.501
bathrooms^2 0.8180 0.216 3.779 0.000 0.394 1.243
bathrooms^3 -0.6577 0.119 -5.531 0.000 -0.891 -0.425
sqft_living^2 2.2922 0.273 8.394 0.000 1.757 2.828
sqft_living^3 -1.2107 0.219 -5.535 0.000 -1.640 -0.782
sqft_lot^2 -0.1480 0.123 -1.207 0.227 -0.388 0.092
sqft_lot^3 0.1319 0.088 1.491 0.136 -0.042 0.305
floors^2 -2.3337 1.147 -2.034 0.042 -4.583 -0.084
floors^3 1.1113 0.543 2.048 0.041 0.047 2.175
sqft_above^2 -2.1768 0.351 -6.194 0.000 -2.866 -1.488
sqft_above^3 1.3407 0.228 5.876 0.000 0.893 1.788
sqft_basement^2 0.0363 0.149 0.243 0.808 -0.256 0.328
sqft_basement^3 -0.1946 0.126 -1.546 0.122 -0.441 0.052
lat^2 1.397e+05 7188.240 19.429 0.000 1.26e+05 1.54e+05
lat^3 -6.99e+04 3595.377 -19.441 0.000 -7.69e+04 -6.28e+04
long^2 6.284e+04 1.72e+04 3.663 0.000 2.92e+04 9.65e+04
long^3 3.145e+04 8583.990 3.664 0.000 1.46e+04 4.83e+04
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 1337.688 Durbin-Watson: 1.993
Prob(Omnibus): 0.000 Jarque-Bera (JB): 29677.038
Skew: 2.170 Prob(JB): 0.00
Kurtosis: 19.672 Cond. No. 2.25e+15


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.47e-27. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: y R-squared: 0.698\n", "Model: OLS Adj. R-squared: 0.695\n", "Method: Least Squares F-statistic: 211.2\n", "Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00\n", "Time: 15:13:28 Log-Likelihood: -1967.6\n", "No. Observations: 2400 AIC: 3989.\n", "Df Residuals: 2373 BIC: 4145.\n", "Df Model: 26 \n", "Covariance Type: nonrobust \n", "===================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "-----------------------------------------------------------------------------------\n", "const 6.291e-09 0.011 5.58e-07 1.000 -0.022 0.022\n", "bedrooms 0.2510 0.120 2.094 0.036 0.016 0.486\n", "bathrooms -0.2267 0.120 -1.893 0.058 -0.461 0.008\n", "sqft_living 0.2066 0.057 3.622 0.000 0.095 0.319\n", "sqft_lot 0.0520 0.047 1.098 0.272 -0.041 0.145\n", "floors 1.1766 0.624 1.886 0.059 -0.047 2.400\n", "sqft_above 0.4473 0.078 5.715 0.000 0.294 0.601\n", "sqft_basement -0.3982 0.060 -6.640 0.000 -0.516 -0.281\n", "lat -6.976e+04 3592.874 -19.417 0.000 -7.68e+04 -6.27e+04\n", "long 3.138e+04 8572.131 3.661 0.000 1.46e+04 4.82e+04\n", "bedrooms^2 -0.5999 0.218 -2.757 0.006 -1.027 -0.173\n", "bedrooms^3 0.2824 0.111 2.535 0.011 0.064 0.501\n", "bathrooms^2 0.8180 0.216 3.779 0.000 0.394 1.243\n", "bathrooms^3 -0.6577 0.119 -5.531 0.000 -0.891 -0.425\n", "sqft_living^2 2.2922 0.273 8.394 0.000 1.757 2.828\n", "sqft_living^3 -1.2107 0.219 -5.535 0.000 -1.640 -0.782\n", "sqft_lot^2 -0.1480 0.123 -1.207 0.227 -0.388 0.092\n", "sqft_lot^3 0.1319 0.088 1.491 0.136 -0.042 0.305\n", "floors^2 -2.3337 1.147 -2.034 0.042 -4.583 -0.084\n", "floors^3 1.1113 0.543 2.048 0.041 0.047 2.175\n", "sqft_above^2 -2.1768 0.351 -6.194 0.000 -2.866 -1.488\n", "sqft_above^3 1.3407 0.228 5.876 0.000 0.893 1.788\n", "sqft_basement^2 0.0363 0.149 0.243 0.808 -0.256 0.328\n", "sqft_basement^3 -0.1946 0.126 -1.546 0.122 -0.441 0.052\n", "lat^2 1.397e+05 7188.240 19.429 0.000 1.26e+05 1.54e+05\n", "lat^3 -6.99e+04 3595.377 -19.441 0.000 -7.69e+04 -6.28e+04\n", "long^2 6.284e+04 1.72e+04 3.663 0.000 2.92e+04 9.65e+04\n", "long^3 3.145e+04 8583.990 3.664 0.000 1.46e+04 4.83e+04\n", "==============================================================================\n", "Omnibus: 1337.688 Durbin-Watson: 1.993\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 29677.038\n", "Skew: 2.170 Prob(JB): 0.00\n", "Kurtosis: 19.672 Cond. No. 2.25e+15\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The smallest eigenvalue is 4.47e-27. This might indicate that there are\n", "strong multicollinearity problems or that the design matrix is singular.\n", "\"\"\"" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_3 = OLS(np.array(y_train).reshape(-1,1), sm.add_constant(X_train_df_3)).fit()\n", "model_3.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## N-Degree Polynomial Model" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2400, 9) (2400, 72)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bedroomsbathroomssqft_livingsqft_lotfloorssqft_abovesqft_basementlatlongbedrooms^2bedrooms^3bedrooms^4bedrooms^5bedrooms^6bedrooms^7bedrooms^8bathrooms^2bathrooms^3bathrooms^4bathrooms^5bathrooms^6bathrooms^7bathrooms^8sqft_living^2sqft_living^3sqft_living^4sqft_living^5sqft_living^6sqft_living^7sqft_living^8sqft_lot^2sqft_lot^3sqft_lot^4sqft_lot^5sqft_lot^6sqft_lot^7sqft_lot^8floors^2floors^3floors^4floors^5floors^6floors^7floors^8sqft_above^2sqft_above^3sqft_above^4sqft_above^5sqft_above^6sqft_above^7sqft_above^8sqft_basement^2sqft_basement^3sqft_basement^4sqft_basement^5sqft_basement^6sqft_basement^7sqft_basement^8lat^2lat^3lat^4lat^5lat^6lat^7lat^8long^2long^3long^4long^5long^6long^7long^8
0-0.400918-0.459840-0.533523-0.184294-0.889785-0.243002-0.670469-0.261919-1.179294-0.462425-0.433878-0.339413-0.231621-0.149138-0.097431-0.067607-0.498149-0.388130-0.241435-0.141594-0.088630-0.062342-0.049025-0.435619-0.219568-0.091216-0.246556-0.658169-1.5010691.377053-0.081332-0.052800-0.1164370.4325390.1235170.4880420.758862-0.820725-0.716812-0.591427-0.470475-0.371825-0.298830-0.247064-0.317640-0.259891-0.157166-0.178065-0.799711-1.5077971.422035-0.429442-0.212933-0.096048-0.050491-0.161975-0.0553120.108540-0.263451-0.264982-0.266511-0.268039-0.269565-0.271089-0.2726121.180094-1.1808931.181692-1.1824901.183286-1.1840821.184877
1-0.4009181.1239940.919986-0.1297290.9975191.399581-0.6704690.5253650.289117-0.462425-0.433878-0.339413-0.231621-0.149138-0.097431-0.0676070.9626230.6096670.2821970.0939360.009656-0.022846-0.0334830.5512470.1663510.0123180.119550-1.053598-1.0212760.389612-0.079773-0.052774-0.111868-0.8102790.8238150.1994801.0008710.8820970.7087720.5056380.3160050.1662000.059351-0.0124321.1042020.6161500.2315210.288260-1.225165-1.0325510.409224-0.429442-0.212933-0.096048-0.050491-0.161975-0.0553120.1085400.5246700.5239710.5232680.5225620.5218510.5211370.520419-0.2897850.290452-0.2911180.291784-0.2924500.293116-0.293781
20.6717730.490461-0.049020-0.167446-0.889785-0.8604261.4996190.7207390.5457330.5331840.3425440.1617190.040880-0.017513-0.038115-0.0419160.2860550.085193-0.024414-0.057144-0.058395-0.051989-0.045573-0.174327-0.140462-0.075159-0.2044410.0010850.687907-0.472558-0.080898-0.052793-0.115420-0.618477-1.4046830.374544-0.388685-0.820725-0.716812-0.591427-0.470475-0.371825-0.298830-0.247064-0.625213-0.367027-0.183625-0.1955070.1031631.0581800.0323490.9657460.3415780.061803-0.0105180.6096241.1727410.7145870.7205310.7203180.7201010.7198800.7196550.7194250.719191-0.5464020.547071-0.5477390.548406-0.5490720.549737-0.550401
3-0.4009180.490461-0.121180-0.035583-0.8897850.222979-0.6704690.066599-0.088678-0.462425-0.433878-0.339413-0.231621-0.149138-0.097431-0.0676070.2860550.085193-0.024414-0.057144-0.058395-0.051989-0.045573-0.217531-0.154904-0.078379-0.2136650.890215-1.6281440.193797-0.076044-0.052686-0.090616-1.003240-1.0149521.084197-0.263787-0.820725-0.716812-0.591427-0.470475-0.371825-0.298830-0.247064-0.003426-0.113103-0.108971-0.1361700.866240-1.6336670.208379-0.429442-0.212933-0.096048-0.050491-0.161975-0.0553120.1085400.0651970.0637930.0623890.0609840.0595770.0581700.0567610.088151-0.0876250.087098-0.0865700.086042-0.0855140.084986
4-0.4009180.490461-0.327352-0.187215-0.889785-0.4526930.154165-1.4117290.232092-0.462425-0.433878-0.339413-0.231621-0.149138-0.097431-0.0676070.2860550.085193-0.024414-0.057144-0.058395-0.051989-0.045573-0.332701-0.190853-0.085868-0.233738-1.0942000.6916541.372554-0.081403-0.052801-0.1165850.054276-0.8466030.175618-1.669502-0.820725-0.716812-0.591427-0.470475-0.371825-0.298830-0.247064-0.436000-0.306038-0.169774-0.1871591.4245081.642941-0.565807-0.227977-0.182506-0.092756-0.050174-0.1596520.307214-2.293748-1.411246-1.410757-1.410261-1.409758-1.409249-1.408733-1.408211-0.2327480.233404-0.2340610.234716-0.2353720.236027-0.236682
\n", "
" ], "text/plain": [ " bedrooms bathrooms sqft_living sqft_lot floors sqft_above \\\n", "0 -0.400918 -0.459840 -0.533523 -0.184294 -0.889785 -0.243002 \n", "1 -0.400918 1.123994 0.919986 -0.129729 0.997519 1.399581 \n", "2 0.671773 0.490461 -0.049020 -0.167446 -0.889785 -0.860426 \n", "3 -0.400918 0.490461 -0.121180 -0.035583 -0.889785 0.222979 \n", "4 -0.400918 0.490461 -0.327352 -0.187215 -0.889785 -0.452693 \n", "\n", " sqft_basement lat long bedrooms^2 bedrooms^3 bedrooms^4 \\\n", "0 -0.670469 -0.261919 -1.179294 -0.462425 -0.433878 -0.339413 \n", "1 -0.670469 0.525365 0.289117 -0.462425 -0.433878 -0.339413 \n", "2 1.499619 0.720739 0.545733 0.533184 0.342544 0.161719 \n", "3 -0.670469 0.066599 -0.088678 -0.462425 -0.433878 -0.339413 \n", "4 0.154165 -1.411729 0.232092 -0.462425 -0.433878 -0.339413 \n", "\n", " bedrooms^5 bedrooms^6 bedrooms^7 bedrooms^8 bathrooms^2 bathrooms^3 \\\n", "0 -0.231621 -0.149138 -0.097431 -0.067607 -0.498149 -0.388130 \n", "1 -0.231621 -0.149138 -0.097431 -0.067607 0.962623 0.609667 \n", "2 0.040880 -0.017513 -0.038115 -0.041916 0.286055 0.085193 \n", "3 -0.231621 -0.149138 -0.097431 -0.067607 0.286055 0.085193 \n", "4 -0.231621 -0.149138 -0.097431 -0.067607 0.286055 0.085193 \n", "\n", " bathrooms^4 bathrooms^5 bathrooms^6 bathrooms^7 bathrooms^8 \\\n", "0 -0.241435 -0.141594 -0.088630 -0.062342 -0.049025 \n", "1 0.282197 0.093936 0.009656 -0.022846 -0.033483 \n", "2 -0.024414 -0.057144 -0.058395 -0.051989 -0.045573 \n", "3 -0.024414 -0.057144 -0.058395 -0.051989 -0.045573 \n", "4 -0.024414 -0.057144 -0.058395 -0.051989 -0.045573 \n", "\n", " sqft_living^2 sqft_living^3 sqft_living^4 sqft_living^5 sqft_living^6 \\\n", "0 -0.435619 -0.219568 -0.091216 -0.246556 -0.658169 \n", "1 0.551247 0.166351 0.012318 0.119550 -1.053598 \n", "2 -0.174327 -0.140462 -0.075159 -0.204441 0.001085 \n", "3 -0.217531 -0.154904 -0.078379 -0.213665 0.890215 \n", "4 -0.332701 -0.190853 -0.085868 -0.233738 -1.094200 \n", "\n", " sqft_living^7 sqft_living^8 sqft_lot^2 sqft_lot^3 sqft_lot^4 \\\n", "0 -1.501069 1.377053 -0.081332 -0.052800 -0.116437 \n", "1 -1.021276 0.389612 -0.079773 -0.052774 -0.111868 \n", "2 0.687907 -0.472558 -0.080898 -0.052793 -0.115420 \n", "3 -1.628144 0.193797 -0.076044 -0.052686 -0.090616 \n", "4 0.691654 1.372554 -0.081403 -0.052801 -0.116585 \n", "\n", " sqft_lot^5 sqft_lot^6 sqft_lot^7 sqft_lot^8 floors^2 floors^3 \\\n", "0 0.432539 0.123517 0.488042 0.758862 -0.820725 -0.716812 \n", "1 -0.810279 0.823815 0.199480 1.000871 0.882097 0.708772 \n", "2 -0.618477 -1.404683 0.374544 -0.388685 -0.820725 -0.716812 \n", "3 -1.003240 -1.014952 1.084197 -0.263787 -0.820725 -0.716812 \n", "4 0.054276 -0.846603 0.175618 -1.669502 -0.820725 -0.716812 \n", "\n", " floors^4 floors^5 floors^6 floors^7 floors^8 sqft_above^2 \\\n", "0 -0.591427 -0.470475 -0.371825 -0.298830 -0.247064 -0.317640 \n", "1 0.505638 0.316005 0.166200 0.059351 -0.012432 1.104202 \n", "2 -0.591427 -0.470475 -0.371825 -0.298830 -0.247064 -0.625213 \n", "3 -0.591427 -0.470475 -0.371825 -0.298830 -0.247064 -0.003426 \n", "4 -0.591427 -0.470475 -0.371825 -0.298830 -0.247064 -0.436000 \n", "\n", " sqft_above^3 sqft_above^4 sqft_above^5 sqft_above^6 sqft_above^7 \\\n", "0 -0.259891 -0.157166 -0.178065 -0.799711 -1.507797 \n", "1 0.616150 0.231521 0.288260 -1.225165 -1.032551 \n", "2 -0.367027 -0.183625 -0.195507 0.103163 1.058180 \n", "3 -0.113103 -0.108971 -0.136170 0.866240 -1.633667 \n", "4 -0.306038 -0.169774 -0.187159 1.424508 1.642941 \n", "\n", " sqft_above^8 sqft_basement^2 sqft_basement^3 sqft_basement^4 \\\n", "0 1.422035 -0.429442 -0.212933 -0.096048 \n", "1 0.409224 -0.429442 -0.212933 -0.096048 \n", "2 0.032349 0.965746 0.341578 0.061803 \n", "3 0.208379 -0.429442 -0.212933 -0.096048 \n", "4 -0.565807 -0.227977 -0.182506 -0.092756 \n", "\n", " sqft_basement^5 sqft_basement^6 sqft_basement^7 sqft_basement^8 \\\n", "0 -0.050491 -0.161975 -0.055312 0.108540 \n", "1 -0.050491 -0.161975 -0.055312 0.108540 \n", "2 -0.010518 0.609624 1.172741 0.714587 \n", "3 -0.050491 -0.161975 -0.055312 0.108540 \n", "4 -0.050174 -0.159652 0.307214 -2.293748 \n", "\n", " lat^2 lat^3 lat^4 lat^5 lat^6 lat^7 lat^8 \\\n", "0 -0.263451 -0.264982 -0.266511 -0.268039 -0.269565 -0.271089 -0.272612 \n", "1 0.524670 0.523971 0.523268 0.522562 0.521851 0.521137 0.520419 \n", "2 0.720531 0.720318 0.720101 0.719880 0.719655 0.719425 0.719191 \n", "3 0.065197 0.063793 0.062389 0.060984 0.059577 0.058170 0.056761 \n", "4 -1.411246 -1.410757 -1.410261 -1.409758 -1.409249 -1.408733 -1.408211 \n", "\n", " long^2 long^3 long^4 long^5 long^6 long^7 long^8 \n", "0 1.180094 -1.180893 1.181692 -1.182490 1.183286 -1.184082 1.184877 \n", "1 -0.289785 0.290452 -0.291118 0.291784 -0.292450 0.293116 -0.293781 \n", "2 -0.546402 0.547071 -0.547739 0.548406 -0.549072 0.549737 -0.550401 \n", "3 0.088151 -0.087625 0.087098 -0.086570 0.086042 -0.085514 0.084986 \n", "4 -0.232748 0.233404 -0.234061 0.234716 -0.235372 0.236027 -0.236682 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# generalizing our function from above\n", "def add_higher_order_polynomial_terms(df, N=7):\n", " df = df.copy()\n", " cols = df.columns.copy()\n", " for col in cols:\n", " for i in range(2, N+1):\n", " df['{}^{}'.format(col, i)] = df[col]**i\n", " return df\n", "\n", "N = 8\n", "X_train_df_N = add_higher_order_polynomial_terms(X_train,N)\n", "X_val_df_N = add_higher_order_polynomial_terms(X_val,N)\n", "\n", "# Standardizing our added coefficients\n", "cols = X_train_df_N.columns\n", "scaler = StandardScaler().fit(X_train_df_N)\n", "X_train_df_N = pd.DataFrame(scaler.transform(X_train_df_N), columns=cols)\n", "X_val_df_N = pd.DataFrame(scaler.transform(X_val_df_N), columns=cols)\n", "\n", "print(X_train_df.shape, X_train_df_N.shape)\n", "\n", "# Also check using the describe() function that the mean and standard deviations are the way we want them\n", "X_train_df_N.head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: y R-squared: 0.738
Model: OLS Adj. R-squared: 0.732
Method: Least Squares F-statistic: 106.4
Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00
Time: 15:13:28 Log-Likelihood: -1796.1
No. Observations: 2400 AIC: 3718.
Df Residuals: 2337 BIC: 4083.
Df Model: 62
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const 0.0790 0.044 1.788 0.074 -0.008 0.166
bedrooms 0.7485 1.667 0.449 0.653 -2.520 4.017
bathrooms 4.0292 2.090 1.928 0.054 -0.069 8.127
sqft_living -1.825e+10 8.44e+09 -2.161 0.031 -3.48e+10 -1.69e+09
sqft_lot 0.1034 0.047 2.200 0.028 0.011 0.196
floors 5.427e+09 2.53e+09 2.146 0.032 4.67e+08 1.04e+10
sqft_above 1.615e+10 7.47e+09 2.161 0.031 1.49e+09 3.08e+10
sqft_basement 8.668e+09 4.01e+09 2.161 0.031 8.02e+08 1.65e+10
lat -6.58e+11 1.11e+11 -5.907 0.000 -8.76e+11 -4.4e+11
long 1.891e+11 1.71e+11 1.105 0.269 -1.47e+11 5.25e+11
bedrooms^2 -13.8231 16.496 -0.838 0.402 -46.172 18.526
bedrooms^3 85.7910 75.252 1.140 0.254 -61.777 233.359
bedrooms^4 -286.3415 209.397 -1.367 0.172 -696.964 124.281
bedrooms^5 585.7759 382.024 1.533 0.125 -163.365 1334.917
bedrooms^6 -724.1268 437.651 -1.655 0.098 -1582.351 134.097
bedrooms^7 487.2274 279.519 1.743 0.081 -60.903 1035.358
bedrooms^8 -135.4299 74.938 -1.807 0.071 -282.382 11.522
bathrooms^2 -50.0310 17.008 -2.942 0.003 -83.384 -16.678
bathrooms^3 271.5388 78.495 3.459 0.001 117.612 425.465
bathrooms^4 -902.0249 244.292 -3.692 0.000 -1381.076 -422.974
bathrooms^5 1860.8234 493.746 3.769 0.000 892.598 2829.048
bathrooms^6 -2254.0226 602.565 -3.741 0.000 -3435.640 -1072.405
bathrooms^7 1451.5753 399.258 3.636 0.000 668.638 2234.513
bathrooms^8 -381.7826 109.773 -3.478 0.001 -597.045 -166.520
sqft_living^2 3.2337 1.155 2.800 0.005 0.969 5.499
sqft_living^3 -4.1768 2.059 -2.029 0.043 -8.214 -0.139
sqft_living^4 3.5216 1.694 2.079 0.038 0.199 6.844
sqft_living^5 -0.0344 0.018 -1.869 0.062 -0.071 0.002
sqft_living^6 0.0243 0.013 1.815 0.070 -0.002 0.051
sqft_living^7 0.0117 0.014 0.856 0.392 -0.015 0.039
sqft_living^8 -0.0216 0.014 -1.575 0.115 -0.049 0.005
sqft_lot^2 -0.2332 0.121 -1.924 0.054 -0.471 0.004
sqft_lot^3 0.1763 0.087 2.023 0.043 0.005 0.347
sqft_lot^4 -0.0156 0.012 -1.337 0.181 -0.038 0.007
sqft_lot^5 0.0011 0.011 0.101 0.920 -0.020 0.022
sqft_lot^6 0.0004 0.011 0.037 0.970 -0.021 0.021
sqft_lot^7 -0.0104 0.011 -0.965 0.335 -0.031 0.011
sqft_lot^8 0.0107 0.011 0.993 0.321 -0.010 0.032
floors^2 -1.676e+10 7.8e+09 -2.147 0.032 -3.21e+10 -1.45e+09
floors^3 1.529e+10 7.1e+09 2.155 0.031 1.38e+09 2.92e+10
floors^4 4.149e+09 2.27e+09 1.830 0.067 -2.97e+08 8.59e+09
floors^5 -1.176e+10 5.92e+09 -1.986 0.047 -2.34e+10 -1.49e+08
floors^6 -3.924e+09 1.94e+09 -2.024 0.043 -7.73e+09 -1.22e+08
floors^7 1.239e+10 5.63e+09 2.200 0.028 1.34e+09 2.34e+10
floors^8 -4.878e+09 2.24e+09 -2.181 0.029 -9.27e+09 -4.91e+08
sqft_above^2 -1.5414 1.287 -1.198 0.231 -4.065 0.982
sqft_above^3 0.7910 1.763 0.449 0.654 -2.666 4.248
sqft_above^4 0.1489 1.039 0.143 0.886 -1.889 2.187
sqft_above^5 0.0015 0.017 0.090 0.928 -0.031 0.034
sqft_above^6 -0.0184 0.013 -1.368 0.171 -0.045 0.008
sqft_above^7 -0.0146 0.014 -1.066 0.287 -0.041 0.012
sqft_above^8 -0.0068 0.014 -0.496 0.620 -0.034 0.020
sqft_basement^2 1.9238 0.798 2.412 0.016 0.360 3.488
sqft_basement^3 -6.6967 2.368 -2.828 0.005 -11.341 -2.053
sqft_basement^4 12.2596 4.045 3.031 0.002 4.327 20.192
sqft_basement^5 -8.5928 2.694 -3.190 0.001 -13.875 -3.311
sqft_basement^6 0.0144 0.012 1.199 0.231 -0.009 0.038
sqft_basement^7 0.0116 0.011 1.059 0.290 -0.010 0.033
sqft_basement^8 -0.0086 0.011 -0.783 0.433 -0.030 0.013
lat^2 2.588e+12 5.62e+11 4.601 0.000 1.49e+12 3.69e+12
lat^3 -3.378e+12 1.13e+12 -2.991 0.003 -5.59e+12 -1.16e+12
lat^4 1.107e+12 1.04e+12 1.067 0.286 -9.28e+11 3.14e+12
lat^5 6.175e+11 2.07e+11 2.986 0.003 2.12e+11 1.02e+12
lat^6 2.786e+11 4.35e+11 0.640 0.522 -5.75e+11 1.13e+12
lat^7 -8.725e+11 3.57e+11 -2.447 0.014 -1.57e+12 -1.73e+11
lat^8 3.168e+11 8.68e+10 3.651 0.000 1.47e+11 4.87e+11
long^2 4.566e+11 5.54e+11 0.824 0.410 -6.3e+11 1.54e+12
long^3 -1.043e+11 7.18e+11 -0.145 0.885 -1.51e+12 1.3e+12
long^4 -7.335e+11 1.2e+12 -0.613 0.540 -3.08e+12 1.61e+12
long^5 6.222e+11 1.23e+12 0.507 0.612 -1.78e+12 3.03e+12
long^6 2.347e+12 1.36e+12 1.725 0.085 -3.2e+11 5.01e+12
long^7 1.831e+12 1.01e+12 1.809 0.071 -1.54e+11 3.82e+12
long^8 4.676e+11 2.65e+11 1.764 0.078 -5.22e+10 9.87e+11
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 1365.789 Durbin-Watson: 2.050
Prob(Omnibus): 0.000 Jarque-Bera (JB): 31897.875
Skew: 2.217 Prob(JB): 0.00
Kurtosis: 20.301 Cond. No. 1.47e+16


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.77e-28. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: y R-squared: 0.738\n", "Model: OLS Adj. R-squared: 0.732\n", "Method: Least Squares F-statistic: 106.4\n", "Date: Mon, 07 Oct 2019 Prob (F-statistic): 0.00\n", "Time: 15:13:28 Log-Likelihood: -1796.1\n", "No. Observations: 2400 AIC: 3718.\n", "Df Residuals: 2337 BIC: 4083.\n", "Df Model: 62 \n", "Covariance Type: nonrobust \n", "===================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "-----------------------------------------------------------------------------------\n", "const 0.0790 0.044 1.788 0.074 -0.008 0.166\n", "bedrooms 0.7485 1.667 0.449 0.653 -2.520 4.017\n", "bathrooms 4.0292 2.090 1.928 0.054 -0.069 8.127\n", "sqft_living -1.825e+10 8.44e+09 -2.161 0.031 -3.48e+10 -1.69e+09\n", "sqft_lot 0.1034 0.047 2.200 0.028 0.011 0.196\n", "floors 5.427e+09 2.53e+09 2.146 0.032 4.67e+08 1.04e+10\n", "sqft_above 1.615e+10 7.47e+09 2.161 0.031 1.49e+09 3.08e+10\n", "sqft_basement 8.668e+09 4.01e+09 2.161 0.031 8.02e+08 1.65e+10\n", "lat -6.58e+11 1.11e+11 -5.907 0.000 -8.76e+11 -4.4e+11\n", "long 1.891e+11 1.71e+11 1.105 0.269 -1.47e+11 5.25e+11\n", "bedrooms^2 -13.8231 16.496 -0.838 0.402 -46.172 18.526\n", "bedrooms^3 85.7910 75.252 1.140 0.254 -61.777 233.359\n", "bedrooms^4 -286.3415 209.397 -1.367 0.172 -696.964 124.281\n", "bedrooms^5 585.7759 382.024 1.533 0.125 -163.365 1334.917\n", "bedrooms^6 -724.1268 437.651 -1.655 0.098 -1582.351 134.097\n", "bedrooms^7 487.2274 279.519 1.743 0.081 -60.903 1035.358\n", "bedrooms^8 -135.4299 74.938 -1.807 0.071 -282.382 11.522\n", "bathrooms^2 -50.0310 17.008 -2.942 0.003 -83.384 -16.678\n", "bathrooms^3 271.5388 78.495 3.459 0.001 117.612 425.465\n", "bathrooms^4 -902.0249 244.292 -3.692 0.000 -1381.076 -422.974\n", "bathrooms^5 1860.8234 493.746 3.769 0.000 892.598 2829.048\n", "bathrooms^6 -2254.0226 602.565 -3.741 0.000 -3435.640 -1072.405\n", "bathrooms^7 1451.5753 399.258 3.636 0.000 668.638 2234.513\n", "bathrooms^8 -381.7826 109.773 -3.478 0.001 -597.045 -166.520\n", "sqft_living^2 3.2337 1.155 2.800 0.005 0.969 5.499\n", "sqft_living^3 -4.1768 2.059 -2.029 0.043 -8.214 -0.139\n", "sqft_living^4 3.5216 1.694 2.079 0.038 0.199 6.844\n", "sqft_living^5 -0.0344 0.018 -1.869 0.062 -0.071 0.002\n", "sqft_living^6 0.0243 0.013 1.815 0.070 -0.002 0.051\n", "sqft_living^7 0.0117 0.014 0.856 0.392 -0.015 0.039\n", "sqft_living^8 -0.0216 0.014 -1.575 0.115 -0.049 0.005\n", "sqft_lot^2 -0.2332 0.121 -1.924 0.054 -0.471 0.004\n", "sqft_lot^3 0.1763 0.087 2.023 0.043 0.005 0.347\n", "sqft_lot^4 -0.0156 0.012 -1.337 0.181 -0.038 0.007\n", "sqft_lot^5 0.0011 0.011 0.101 0.920 -0.020 0.022\n", "sqft_lot^6 0.0004 0.011 0.037 0.970 -0.021 0.021\n", "sqft_lot^7 -0.0104 0.011 -0.965 0.335 -0.031 0.011\n", "sqft_lot^8 0.0107 0.011 0.993 0.321 -0.010 0.032\n", "floors^2 -1.676e+10 7.8e+09 -2.147 0.032 -3.21e+10 -1.45e+09\n", "floors^3 1.529e+10 7.1e+09 2.155 0.031 1.38e+09 2.92e+10\n", "floors^4 4.149e+09 2.27e+09 1.830 0.067 -2.97e+08 8.59e+09\n", "floors^5 -1.176e+10 5.92e+09 -1.986 0.047 -2.34e+10 -1.49e+08\n", "floors^6 -3.924e+09 1.94e+09 -2.024 0.043 -7.73e+09 -1.22e+08\n", "floors^7 1.239e+10 5.63e+09 2.200 0.028 1.34e+09 2.34e+10\n", "floors^8 -4.878e+09 2.24e+09 -2.181 0.029 -9.27e+09 -4.91e+08\n", "sqft_above^2 -1.5414 1.287 -1.198 0.231 -4.065 0.982\n", "sqft_above^3 0.7910 1.763 0.449 0.654 -2.666 4.248\n", "sqft_above^4 0.1489 1.039 0.143 0.886 -1.889 2.187\n", "sqft_above^5 0.0015 0.017 0.090 0.928 -0.031 0.034\n", "sqft_above^6 -0.0184 0.013 -1.368 0.171 -0.045 0.008\n", "sqft_above^7 -0.0146 0.014 -1.066 0.287 -0.041 0.012\n", "sqft_above^8 -0.0068 0.014 -0.496 0.620 -0.034 0.020\n", "sqft_basement^2 1.9238 0.798 2.412 0.016 0.360 3.488\n", "sqft_basement^3 -6.6967 2.368 -2.828 0.005 -11.341 -2.053\n", "sqft_basement^4 12.2596 4.045 3.031 0.002 4.327 20.192\n", "sqft_basement^5 -8.5928 2.694 -3.190 0.001 -13.875 -3.311\n", "sqft_basement^6 0.0144 0.012 1.199 0.231 -0.009 0.038\n", "sqft_basement^7 0.0116 0.011 1.059 0.290 -0.010 0.033\n", "sqft_basement^8 -0.0086 0.011 -0.783 0.433 -0.030 0.013\n", "lat^2 2.588e+12 5.62e+11 4.601 0.000 1.49e+12 3.69e+12\n", "lat^3 -3.378e+12 1.13e+12 -2.991 0.003 -5.59e+12 -1.16e+12\n", "lat^4 1.107e+12 1.04e+12 1.067 0.286 -9.28e+11 3.14e+12\n", "lat^5 6.175e+11 2.07e+11 2.986 0.003 2.12e+11 1.02e+12\n", "lat^6 2.786e+11 4.35e+11 0.640 0.522 -5.75e+11 1.13e+12\n", "lat^7 -8.725e+11 3.57e+11 -2.447 0.014 -1.57e+12 -1.73e+11\n", "lat^8 3.168e+11 8.68e+10 3.651 0.000 1.47e+11 4.87e+11\n", "long^2 4.566e+11 5.54e+11 0.824 0.410 -6.3e+11 1.54e+12\n", "long^3 -1.043e+11 7.18e+11 -0.145 0.885 -1.51e+12 1.3e+12\n", "long^4 -7.335e+11 1.2e+12 -0.613 0.540 -3.08e+12 1.61e+12\n", "long^5 6.222e+11 1.23e+12 0.507 0.612 -1.78e+12 3.03e+12\n", "long^6 2.347e+12 1.36e+12 1.725 0.085 -3.2e+11 5.01e+12\n", "long^7 1.831e+12 1.01e+12 1.809 0.071 -1.54e+11 3.82e+12\n", "long^8 4.676e+11 2.65e+11 1.764 0.078 -5.22e+10 9.87e+11\n", "==============================================================================\n", "Omnibus: 1365.789 Durbin-Watson: 2.050\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 31897.875\n", "Skew: 2.217 Prob(JB): 0.00\n", "Kurtosis: 20.301 Cond. No. 1.47e+16\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The smallest eigenvalue is 1.77e-28. This might indicate that there are\n", "strong multicollinearity problems or that the design matrix is singular.\n", "\"\"\"" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_N = OLS(np.array(y_train).reshape(-1,1), sm.add_constant(X_train_df_N)).fit()\n", "model_N.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also create a model with interaction terms or any other higher order polynomial term of your choice. \n", "**Note:** Can you see how creating a function that takes in a dataframe and a degree and creates polynomial terms up until that degree can be useful? This is what we have you do in your homework!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Regularization\n", "\n", "## What is Regularization and why should I care?\n", "\n", "When we have a lot of predictors, we need to worry about overfitting. Let's check this out:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAGJCAYAAAD8L4t3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzdeZhT5cH///c5yWSbYViGQQQUweV2FxfcEHFDZatPtbZq3Vr7uLTVuoO/52v1ah8fccFdq9aqrUsXly5s4m5VtC0iWrXcoCIii8IAwkySyWT5/ZGAM8Psk5mTyXxe1zXXTE5OTj4nQe9P7nOSOJlMBhEREeldXK8DiIiISPdTARAREemFVABERER6IRUAERGRXkgFQEREpBdSARAREemFVABERER6Ib/XAUR6OmPMYcBtQAJYBZxtra3zNpWISMs0AyDSecuBY6y144BPgZM8ziMi0irNAIh0krV2Vb2LSSDtVRYRkbZSARBphTGmP7AeqCE7a7YW+IW19jeN1hsBTABu6PaQIiLtpEMAIq0bBayz1pZZayPANcADxpiBW1YwxpQDvwXOstYmPMpZUIwxRxljPvDw/g8yxjzdyjptymiM+Y4x5tW8hRMpAJoBEGndKGBhvcuvAT6gP7DOGOMHfg9cb621HuSTJlhrFwDf8TqHSKFSARBp3f7AOwDGmH7AjbnLH+euPx04BPi5MebnwK+stX+svwFjTBnwCLAr2XME3gEusNamjTE/BK4AUsA64Bxr7QpjzPnAJbnlXwI/BYYAd5I9HFEGjAaOB/4fEACiwJXW2rca3f+TwDvW2hm5yxcBRwHnNZer0e2PAm4ie8Lj7kAMONda+5+mclprlzS6/a+Br6y1/5O7fCZwSm5fbiB78uTeQEnu/t9sbru5LDcCnwMm91hMz61rgGestZfl1rvHWru3McYFbgcOBfoADvAjWmCM+QXwfaAKWNrouinNPebGmGm5x3Uz8Hfgv4Bzacfz1tL2RfJFhwBEWjcK+JkxZhOwARgEnGitzQBYax+z1g601h6V+/ljE9v4NtDHWjuK7P/8AUYaY/YjO7CeaK3dF/gb8D/GmGOAq4GjrbX7AU8CfyE7cO0NnJ5bf0fg/4CJ1tr9gfOBZ40xpY3u/9dkB6Etzs0tazJXM4/DQcDduft9BHisuZzGGKfRbe8FfpCbLSGX8/7c34cAM3L5HwH+rw3bHQ1Mz+XeRPawzCTgAOAnxpghje7/ELLl6TBr7Z5kD9dMa2Y/McacRLagjAIOB/rWu25XmnnMjTEnkH1sRwMHki0bW7TpeWtp+83lFekIFQCRFhhjgsAewL7W2nKyU8qHAu19n/8bwF6548jTgDustR8DxwLzrLUrAKy1d1hrLwROBP5orV2bW/4oMBTYCVhhrV2e2+54YHvgJWPMIuAJsq/kd2l0/68Codxx8T2BSuClFnI15T1r7eu5vx8mOzNyegs5t7LWLgKWAZOMMXuQHYyfz129PHc9ZA+1DGhl/wGWWWvfzf39CfCKtTZhrV1HthAMaHT/b5F9RX2BMeZWss9jWTP7CXAc8Ky1drO1Npnb3y1aeswnAk9ZazfmCuK99W7X1uetrc+pSKfoEIBIy/YGaslOUWOtfcYYcx3ZV4cPt3TD+qy1y4wxu5Cddj8GeDE3xZ0EMlvWM8aEgeFkzzFofDKhQ3aKvLreMh/wkrX2e/W2sQPZDySqf/8ZY8xvgLNz+/Ob3ADVZC5r7cwmdiPZKAtkX0RkGq23JWdj9wI/BJYAD+YyQfZwwhaZ3O1b2n9y+1Bfi4XMGDOJ7BT8DOCvwGLgzJZuwzf7CA33vaXHPNnodql6f7f1eTuyhetE8kYzACIt2x/4YMt0f84c4Fvt2UjumPsjwPPW2qnAPLLT1a8Axxljts+tegFwM/AccJoxpjJ3+x+QPRbd+NX5S8Dxxpjdc+tNBN4Hwk3EeDSX+9RclpZyNWWUMWbf3N/nA/OBP7QxJ8DTZB/P79B6eWrr/rfVeGCmtfZXwAKyx+V9Law/FzjVGNMvd/7AWfWua+kxnw2cYozZcsjgPLYtSK1toz3PqUiHqQCItGwU2f/51vccMN4YE2rHdn5HdsD5yBjzDtljyndZa/8NXAU8Z4x5j+zU94XW2hfInrT2sjHmQ+AcYDKNPmTIWvsR2cH4D7nb/xL4lrW2/qvNLeuuITvF/n69Dy9qMlcz+7AGuMEY82+yA+hZzeVsfBJh7v4TZEvA/NxUfbPas902uh84Kpd9IdnDBiNo5v+B1to5ZEvKAuAfwNf1rmv2MbfWvkz23Iq3jDELyD6e0Sa239I22vycinSGk8k0VU5FRL5R/4z6TmyjlOxZ8T+x1r6dr2yFxBhzEHC4tfau3OXLgUPqT+eLFAqdAyAiXS53dvzvgfuKdfDPWQJMzZ3fkSH7VsXzvY0k0jTNAIiIiPRCOgdARESkF1IBEBER6YV60zkAQbKfzrWahu/NFRERKUY+sh8q9S+2/eyMXlUARgOvt7qWiIhIcRlL9lM/G+hNBWA1wIYNNaTT+TnxsaKijKqq4n9rrvazuGg/i4v2s7jkcz9d16F//1LIjX+N9aYCkAJIpzN5KwBbttcbaD+Li/azuGg/i0sX7GeTh711EqCIiEgvpAIgIiLSC6kAiIiI9EIqACIiIr2QCoCIiEgvpAIgIiLSC6kAiIiI9EIqACIiIr2QCoCIiEgv1Js+CVBERKQgvfXhGp597RPWb6plQHmQk8ftzGF7De7S+1QBEBER8dBbH67ht3MXk0imAajaVMtv5y4G6NISoAIgIiI9XiaTIZX7rpdUOkO63uWtyxr/bnyb3O9UOt30bTINL7d1e81uN7e9JSs2kkw1/Pz/RDLNs699ogIgIiKtS2daGOxaHcDSW9dJpdo+qDV1/bb3mc4NgB3PhwN1yXQTt80uyxTA9wT5XAef6+DW+731b6fhdT7X3fp348F/i6pNtV2aVwVARIpGJpMdCFLpDPHaJNF4stEA1cQruyZeKaaaGYSaHxDTrQ6IyQ6+avzm+uayQyqVvc7rMdBxaDgAOk0NiO431/nqXe84lPhcfCUN19/yOxIOUJdIbjuwNrovn89t8n5by9XigO1z8TktbC+3fkdddd+bTQ72FeXBzjwdrVIBkF7Pi5NvulImk2n5laDPx9oN0RamKZsbhNLbDGrNvXJr6pVdc68qv1kv3bbt1R8wU9+su+V6r7ktDRTbvArcdh1/ibt1QGx+wPlmQCorC1Ibr2uwzSYHyVYGsG+ud5scEJvcrtNwWWcGwdZUVvZh7drNXbZ9L508bucG5wAABPwuJ4/buUvvVwVAerWmTr55ZM5/WLW2GjO8fyemU5sbqJqawmy4vbbcV0sDdroA5kIbDBStDTxNDEIlrvvNK8U2vnJr+CrQobxPiFg00fSruy0DXXteKfpa3xfXcXC6cBBsSjEPjL3FlhcceheASBeKxutYua6GlWtrWLmuhtcWrdzm+FsylWH2258z++3PO3VfHT0e6LoOfp+LW9LSNGrbXtk13q7PdehbHiZaU9vB7TXxKrFRPseh2wfBpmhglJ7ksL0Gc9heg7v1360KgBSleCLJqnVRVq6rZuXaGlatyw74GzZ/c5wtGPA1e/INwDVnHuDJ8cCupoFRREAFQHq4RF2K1VVRVq2r4Yt11azKvbJf93V86zolfpchFaXsvmN/hlWWMmRgKUMrSxlQHmLqr+Y3e/LNrsP6deeuiIh0KxUA6RGSqTRfro82mL5fua6GrzZEt779x+c6DK6IMHJIOWP33Z6hlWUMrSylsm8Y1236FblXJ9+IiHhNBUAKSjqd4auNMVaurd462K9aV8Oa9dGtZ3g7DmzXP8KwylIO2WMQQyvLGDKwlO36h/H72vf1Fl6dfCMi4jUVAPFEOpOh6ut4bpCvzh6jX1vD6vVR6uq9Gq/sF2LowDJG7TowO3U/sJTtKyKU+H15y3Jg4FP26fcsGf96nLIBBAKnACoAIlLcVACkS2UyGTZWJxq8ol+5rppV66LU1qW2rjegPMiQgaXsudOArcfoh1SUEgzkb6BvSmLpfGpffxSSiWze6qrsZSCw6+Fdet8iIl5SAZBmtfcDcjbV1Bvot/ysrSFWm9y6TnlpgKEDS3PH6EsZOjA7fR8Jdc8/xUwmQ6a2mkz1ejI166md/8TWwX+rZILEv55RARCRoqYCIE1q6dup9t25YuuJeKtyr+hXrqthc7Ru6+1LQ36GVpZx6J7b5Qb67Nn3fSKBLsucyWSgtoZ0zQYyNVWkq9eTqdmQ+71+629Sda1vq7qqy3KKiBQCFQABsoNnoi5NPJEklkjxp5c/bnBmPGS/neqhWR81+NKNUMDH0IGl7L/rQIYMLNs62PctDeT9w2AyiWh2EK9eT7qm4aC+Zdk2r+YdFyfSD6dsAL6Bw3F22h+3dABO2QDc0gHEXribTM2Gbe7LKe2f1+wiIoVGBaADPnplLn2WzGYT1XxNGZt3m8SeR0/o9hzpdIZ4Ikk8kSKWSBGvzf69ZVk8kSLWaNk3l+uvl/3dlk+QzWTg1KN3ZujAMoYOLGVAeTAvA30mEcsN6htIV1flpug3ZJdtGdzr4g1v5DjZwb20P+6AYfh23A+3tP/Wwd0pq8AJ98Vxm39nQODgUxucA7A1Dy6ZeDVOqKzT+yYiUohUANrpo1fmst2Spwk42RPY+lFNZMnTfARtKgF1yTSxLQNvEwP2lmWxRstijddLJEnUpVu9P8i+Pz4U8BEK+AkFfYQCPkpDfirKg9llAR+hoI/wlr8Dfv7w0lI2x7adKq8oDzLhkOHteswydbUNXq3XH9Qz1RtI11RBItboVg5OuDw7mPfbHt+wvRq8cnfKBuBE+uK4nfsnvOU4f+Jfz5Cpzr4LwD/yYOo+fIHorJsIT7oKN1zeqfsQESlEKgDt1GfJ7K2D/xYBJ8XQJX9izbJ5JPFlfzI+EhkfibS79ac25ZDIZK+vy2TXSeJm/84tq8vdFp8f11+C6w9QEggQKgniD5fg7xukJFhKSTBIIBgkFCwhHNwycPvqDejZ3+GAD7/Pbf+rdAfef2E2JwYX0t+tYUO6lOdqD2DfcZMarJZJJnKD+obsoF5d1fByzXqordl28+FynNIBuOWV+IYYnNIK3LL+2WVlA3Ai/XF83fPPM7Dr4QR2PbzBR+T6h+1FbN6dxGbdRHjS1biRvt2SRUSku6gAtFNfqptc7pDBxisIummCbpoSJ0XATRPy1VHiT+MniY8UvkwSN5PCzf1uk2Tup/GLZADXB74SHF8J5H4cXwn4S3BcP3X+EurqXe/4/Nuu1/i2vhL2//oT9iydvzXjAF8Np5W+iX9ZFdEVgexx95oNZOLbfqa8E+qDU9ofp6yCksG7fjOobx3c++H4u+5kwHzwD9ub8ITLiT13e7YETL4aN6KPBhaR4qEC0E5fU0a/JkrA15Qx7sfXtmtbmUwaUklI1ZFJ1X3zO1nv71Ry63Wk6sgk6yCdzP6uf5tGt9+6LFlHpjZKpsF6yW/up4US0vjIuZtJk171H9wBw7In1Q0amRvUK7LH4XODfKEP7m3lH7IH4QlXEJt7G9GZ04lMnoqrkwNFpEioALTT5t0mEal3DgBAIuNj826TWrhV0xzHBX8A/AG8+u64TDrVZAmJPvU/zd2C0u/8slszesm/vSEy8Uqic2cQnXljtgSUVXgdS0Sk09r3wenCnkdP4MvdvsPGTBmZDGzMlPHlbt/x5F0A+eC4PpySIE6oDLe0P275IHz9h+I0M8g1t7yY+QbvSmTSVWRim4nOnE5681qvI4mIdJpmADpgz6MnwNETivp71QOjT9n27XH+AIHRp3iWyUu+QTsTmXw10dm3fHM4oHyQ17FERDpMMwDSpMCuhxMce27uFb+DU1ZBcOy5vfrjcX2VI4hMnkqmLp6dCfh6jdeRREQ6TDMA0qym3h7X2/kGDicyeSqx+jMB/bb3OpaISLsV3AyAMeYMY8xHxpilxpifNHH9dcaY5caYRbmfbdYR6Uq+ih0JT54KmTTRmTeS2rDS60giIu1WUAXAGDMUuAE4AhgFnG+M2bPRagcBp1lrR+V+7u3unCK+AcMIT54GOMRmTie1/guvI4mItEtBFQDgOOBla+16a20N8DTwnUbrHAT8f8aY940x9xhjQt2eUgTw9R9CZMo14PNnS0DV515HEhFps0I7B2AIsLre5dXAwVsuGGPKgHeBq4CPgUeBa4Hm3rS+jYqK/H65S2Vln7xur1BpP5u7QR/qzvlfVj1+HfE5N7P96dcR3H5k14TLIz2fxUX7WVy6az8LrQC4QP3vpHOArd94Y62tBiZuuWyMmQE8TDsKQFVVNel0G772rg16y8lx2s/WlBKaOJXorOmsfPw6IhOvxDeocEuAns/iov0sLvncT9d1WnzRW2iHAL4A6p9SPRhYteWCMWZHY8wP613vANt+ZZ1IN3PLK4lMuQYnWEp09i2kvvzY60giIi0qtALwInCsMabSGBMBTgGeq3d9DLjZGDPCGOMAPwH+7EFOkW24fQZmS0C4nOicW0muWeJ1JBGRZhVUAbDWriQ7nf8KsAh40lr7T2PMHGPMQdbatcAFwEzAkp0BmOFZYJFG3LIBRKZMw430IzZnBslVi72OJCLSJCeTyc/x8B5gJ2CZzgFoP+1n+6WjG4nNvpn0pnWET7wU/9DG72b1jp7P4qL9LC5ddA7ACOCzba7Py72ISANupB/hydNw+w4i9tztJL/4wOtIIiINqACIdBE3XE4491HBsXl3kPz8Pa8jiYhspQIg0oXcUB8ik67G7T+M2PN3k1z+rteRREQAFQCRLueEyohMugq3Ykdiz99D3bJ3vI4kIqICINIdnGApkUlX4g4aQfzFe6n79J9eRxKRXk4FQKSbOIEIkQlX4NtuF+Iv3U/dx297HUlEejEVAJFu5ATChCdcjm/wbsRfeYC6JW96HUlEeikVAJFu5pSECE+4DN+QPYi/+hB1i//udSQR6YVUAEQ84PiDhE+4FN+wvYj//WES/3nV60gi0suoAIh4xPEHCB9/Cb4d96P29UdJfPiS15FEpBdRARDxkOMPEB7/U/zD96f2zcdIfPCC15FEpJdQARDxmOMrIXTcT/DvdCC1858g8f5cryOJSC+gAiBSAByfn9BxF+EfeTC1b/+R2kWzvI4kIkXO73UAEclyXD+hYy4g7rok/vk0pFMEDzjJ61giUqRUAEQKiOP6CB11PnHHR2LBnyGdJnDgf+E4jtfRRKTIqACIFBjHdQmNO49a1yWx8K+QThEYfYpKgIjklQqASAFyXJfgkT8A10di0Swy6STBQ76nEiAieaMCIFKgHMcleMQ54Pioe/+57DkBh52hEiAieaECIFLAHMchOOZMcH3UffA8pNMEx5ypEiAinaYCIFLgHMcheNjp2RLw/lzIpAgecTaOo3fxikjHqQCI9ACO4xA85Ls4uXMCSKcJHnmuSoCIdJgKgEgP4TgOgdGnZE8MXPhXMukUoXHn4bgqASLSfioAIj2I4zgED/o2uC6JBX8mnkkROuq/cVyf19FEpIdRARDpgYIHnJSdCfjn08TTaULHnI/j6j9nEWk7/R9DpIcKjpqM4/qoffuPxNMpQsdehOPTf9Ii0jY6eCjSgwX2nUDwsDNIfvYO8RfvJZOq8zqSiPQQKgAiPVxgn+MJjjmL5PJ3iT1/N5lkwutIItIDqACIFIHAXscSHHsuqRXvE3v+LpUAEWmVCoBIkQjscRShceeR+uJDYvPuJJOs9TqSiBQwFQCRIlJixhI66kekVn1EbO7tZOriXkcSkQKlAiBSZEp2G0Po6PNJrbHE5t5GJhHzOpKIFCAVAJEiVLLLYYSOvYjUlx8TnTuDTCLqdSQRKTAqACJFqmTkwYSO+zHpr5YRnX0rmdoaryOJSAFRARApYiUjDiI8/qekq5YTnX0zmXi115FEpECoAIgUOf9O+xM+/hLSG1YSnX0TqegmryOJSAFQARDpBfw77kf4hEtJb1zD6ieuIx1TCRDp7VQARHoJ/7C9CZ94GXXr1xCbNZ10dKPXkUTEQyoAIr2If+ieDD7t/5HeXEVs5nTSNRu8jiQiHlEBEOllwsP3IjzxCtLRjURnTSddvd7rSCLiARUAkV7IP3g3IhOvJBPdRHTmjaQ3r/M6koh0MxUAkV7Kt90uRCZdRaa2JlsCNq31OpKIdCMVAJFezDdoJJHJV5Opi2dLwNdfeh1JRLqJCoBIL+cbuBORyVMhmciWgI1rvI4kIt1ABUBE8FXsSHjKVEiniM68kdSGVV5HEpEupgIgIgD4BuxAeMo0IENs1nRS61d6HUlEupAKgIhs5es/lMiUa8BxsyWgaoXXkUSki6gAiEgDbr/tsyXAV0J01nRS65Z7HUlEuoAKgIhsw+27HZEp03BKQkRn3URq7TKvI4lInhVcATDGnGGM+cgYs9QY85Mmrh9ljFlgjFlijHnIGOP3IqdIsXPLB2VLQLCU6OybSX31ideRRCSPCqoAGGOGAjcARwCjgPONMXs2Wu1x4KfW2t0AB/jv7k0p0nu4fSqzJSDUh+jsW0iuWep1JBHJk4IqAMBxwMvW2vXW2hrgaeA7W640xgwHwtbat3OLHgVO7faUIr2IW1ZBZMo1OJF+xObcSnK19TqSiORBoU2fDwFW17u8Gji4leuHtecOKirKOhyuKZWVffK6vUKl/Swu7d7Pyj4kz/lfVj95PfHnbmPwd68hvNM+XRMuj/R8FhftZ34VWgFwgUy9yw6Qbsf1raqqqiadzrS+YhtUVvZh7drNedlWIdN+FpeO76efwIlXkZp9C6v/cAPhE36Gf9jeec+XL3o+i4v2s/1c12nxRW+hHQL4Ati+3uXBwKp2XC8iXciN9CU8+WrcvoOJzbuD5Ir3vY4kIh1UaAXgReBYY0ylMSYCnAI8t+VKa+1yIG6MGZNbdBYwt/tjivRebricyOSpuP2GEpt3F8nli7yOJCIdUFAFwFq7Evgf4BVgEfCktfafxpg5xpiDcqt9H7jdGLMYKAPu8iatSO/lhMqITL4at2IHYi/cTd1n73gdSUTaqdDOAcBa+yTwZKNlE+v9/R4NTwwUEQ84wVIiE68kOncG8Rfug2MvpGTkaK9jiUgbFdQMgIj0LNkScBW+QSOJv/Qr6j75h9eRRKSNVABEpFOcQJjwxCvwDd6V+Mv3U7d0vteRRKQNVABEpNOckhDhEy/Ht/3uxF/5NXX2da8jiUgrVABEJC+ckiDhEy/FN3RP4q89TGLxa15HEpEWqACISN44/iDhE36Gb4d9qP37IyQ+etnrSCLSDBUAEckrxx8gfPzF+Hbcj9o3fkfigxe8jiQiTVABEJG8c3wlhMdfjH+nA6id/wSJ9+d5HUlEGlEBEJEu4fj8hI77Mf4RB1H79u+pXTTH60giUo8KgIh0Gcf1Ezr2Ivw7H0Lin3+iduHfvI4kIjkF90mAIlJcHNdH6OjziTsuiQXPQiZN4ICTcBzH62givZoKgIh0Ocf1ETrqv4m7PhLv/AXSKQIHnawSIOIhFQAR6RaO6xIa90NqXR+Jd2dmS8DBp6oEiHhEBUBEuo3juATHngOuj8R7c8ikUwQPPU0lQMQDKgAi0q0cxyU45ixwfdT9ex6kUwQP/75KgEg3UwEQkW7nOA7Bw84Ax82WgEya4JgzcRy9MUmku6gAiIgnHMfJTv/nDgeQThIce65KgEg3UQEQEc84jkPg4FOz5wS8O5NMOk3oyB/iuCoBIl1NBUBEPOU4DsHRp4DrJ/HOn4mnU4SO+hGO6/M6mkhRUwEQkYIQPPAkcF0S/3qGeCZN6OjzVQJEupAKgIgUjOD+U3BcH7X/+FN2JuDYC3Fc/W9KpCvoQJuIFJTAfhMJHno6yWULiL94H5lU0utIIkVJBUBECk5g3xMIHn4myc8WEnvhbjKpOq8jiRQdFQARKUiBvY8jeMTZpD5/j9jzd5FJJryOJFJUVABEpGAF9jyG4JE/ILXiA2Lz7iSTrPU6kkjRUAEQkYIW2H0coaPOI7XyI2LP3UGmTiVAJB9UAESk4JXsdgSho/+b1OrFxJ67jUwi5nUkkR5PBUBEeoSSXQ8ndMyFpNYsJTp3hkqASCepAIhIj1Gy8yGEjr2I9FfLiM65hUxtjdeRRHosFQAR6VFKRo4mNP4npNctJzrnVjLxaq8jifRIKgAi0uOU7HQA4fEXk65aQXT2zSoBIh2gAiAiPZJ/+CjCJ1xCeuMqorOmk45t8jqSSI+iAiAiPZZ/h30Jn3Ap6a+/JDbrJtLRr72OJNJjqACISI/mH7Y34QmXk968ltis6aSjG72OJNIjqACISI/nH7IH4QlXkK5eT3TmdNI1G7yOJFLwVABEpCj4tzdEJl5JJrqR6MwbSX691utIIgVNBUBEioZv8K5EJl1FJr6ZVY/9nPRmlQCR5qgAiEhR8Q3amcikq0nXRrOHAzZ95XUkkYLU4QJgjAkbY/YxxjjGmEg+Q4mIdIavcgTbf/86qKslOvNG0l+v8TqSSMHpUAEwxhwKfALMBoYCK4wxh+czmIhIZwQHjyQ8eSqkkkRnTie1cZXXkUQKSkdnAG4BjgOqrLVfAGcBd+YtlYhIHvgqdiA8eRpk0sRmTie1fqXXkUQKRkcLQMRa+9GWC9baOYA/P5FERPLHN2BotgTgEJs1ndT6FV5HEikIHS0AdcaY/kAGwBhj8hdJRCS/fP2HEJlyDfj8xGbeRGrdcq8jiXiuowXgBuA1YJgx5vfAfOB/85ZKRCTP3H6DsyXAHyA6+2ZSaz/zOpKIpzpaAJ4DTgauA94EjrDWPpO3VCIiXcAtH0RkyjU4gTDR2TeR+upTryOJeKajx+3/Za0dBXyczzAiIl3NLa8kMuUaojOnE519C5GJV+DbbhevY4l0u47OANQYY4blNYmISDdxyyqyMwGRcqJzbiW52nodSaTbdbQAlALLjDGfGmPeN8b82xjzfj6DiYh0JbdsAJHJ03BL+xObO4Pkqv94HUmkW3X0EMDPAAfYKbeNT4B0njKJiHQLt7Q/4clTic2+mdjc2wmf8DP8w/byOpZIt+hoAVgF/IXsp69NI2gAACAASURBVAA6wDpgcmfDGGN2BB4HBgEW+L61trrROsOBD8iWDoAvrbUndPa+RaR3ciP9CE+eli0B8+4gfPwl+HfYx+tYIl2uo4cA7gZuttb2s9b2JfsWwHvzkOc+4D5r7e7AAuDaJtY5CHjSWjsq96PBX0Q6xQ2XE548Fbff9sTm3Uny8/e8jiTS5TpaALaz1v52ywVr7SNAZWeCGGNKgCOBp3OLHgVObWLV0cDexphFxpiXjTGq6iLSaW6oD5FJV+MOGEbs+btIfvau15FEulRHDwH4jTEDrLXrAYwxA8l9KmAnDAQ2WWuTucurgabeaRAne5jgAeBE4C/GmD2stYm23ElFRVknYzZUWdknr9srVNrP4qL9bE4fUuf8gjW//yWxF+9hu29fTunuh3ZJtnzS81lcums/O1oA7gbeNsb8kezAfxpwe1tvbIw5tYn1l7JtidjmxEJr7fX1Ls4xxtwI7AG0ac6uqqqadLqzXSWrsrIPa9duzsu2Cpn2s7hoP1tXcvxl1M29jS+fnUHomAsp2fngPKfLHz2fxSWf++m6TosvejtUAKy1DxpjlpJ9Be4DLrLWvtSO2z8FPFV/We4QQJUxxmetTQHbkz3ZkEbrXUz2HICq3CIHqOvIfoiINMUJRIhMuILYc7cTf/lXkElRssthXscSyasOnQNgjBkKnGqtnQr8GrjYGDO4M0GstXXA68D3covOBuY2seo44LxcjnFkC8jizty3iEhjTiBMeMIV+AYb4q88SN2SN72OJJJXHT0J8Ld8M+guB14FHs5Dnh8D5xtjPgLGAv8PwBhzoTHmF7l1fgaMN8Z8ANwKnG6t1WcQiEjeOSVBwhMuwzdkT+KvPkTd4r97HUkkbzp6DsBAa+1dANbaOHCHMeaczoax1i4Hjmpi+f31/l4JjO/sfYmItIXjDxI+4WfEnr+L+N8fJpNOEdjzaK9jiXRaR2cA/MaYIVsuGGO2I3ssXkSk6Dj+AOHjL8G3437UvvFbEh++6HUkkU7r6AzAbcAiY8xzZM/cPw64Km+pREQKjOMPEB7/U+Iv3kftm49DOkVgH30OmfRcHZoBsNY+THbQf5fsJ/adYK19Mp/BREQKjeMrIXTcT/DvdCC1b/2exHtNnacs0jN09BAAQLW19nZgGfBtY0zfPGUSESlYjs9P6LiL8I88mNp//JHad2d5HUmkQzp0CMAY80Du9x3Ag8A8su8COCV/0URECpPj+gkdcwFx1yXxr6chnSJ44ElexxJpl47OABwIXAR8G/ittfYHwPC8pRIRKXCO6yN01Pn4dxtD4p0/U7vgWTKZ/HzKqEh36GgBcHPvvR8PvJxbFslPJBGRnsFxXULjzqPEHEli4d9I/OtplQDpMTr6LoCPjTFzgJHAq8aYJ2jjZ/GLiBQTx3EJHnkuuC6JRbPJpFMED/kejqN3Rkth62gB+AHZ6f83rLV1xpjXgd/lL5aISM/hOC7BI84Bx0fd+89lzwk47AyVACloHf0yoBqyX8m75fL9LawuIlL0HMchOOZMcH3UffB8tgSMORPH6cybrUS6TkdnAEREpBHHcQgednq2BLw/F9JpgmPPVgmQgqQCICKSR47jEDzkuziuj8SiWZBJERz7AxxXJUAKiwqAiEieOY5DYPQp4PpILPwrmXSK0LgfqQRIQWmxABhjfMBJQBqYZa1N5pafaq19qhvyiYj0SI7jEDzo29kSsOBZ4uk0oaP/G8f1eR1NBGj9cwB+BxwAjALeMMbsklt+UZemEhEpEsEDvkXg4FNJfvI28ZfvJ5NOeh1JBGj9EMAQa+33AYwxvwUeMcZc3+WpRESKSHDUJBzXpfbtP2ZnAo69CMenI7DirdZmAILGmCCAtXYZMAW4Eti7q4OJiBSTwL4TCB52BsnP3iH2wj1kUnVeR5JerrUCcDnQf8sFa+1msucEXN6VoUREilFgn+MJjjmL1OeLiD1/N5lkwutI0ou1OAdlrX27iWUp6n0IkIiItF1gr2PB9VH7+qPEnr+L8PGX4PgDXseSXqhd70kxxgwwxuzczHX6Fywi0gaBPY4iNO48Ul98SOy528kka72OJL1Qm89CMcb8EHgAcI0xC4AJQB1wCtnDAscBfboipIhIsSkxY8Fxib/2ELG5txE+8TKckpDXsaQXac8MwLXA2cCOwGKyhwGWA5flLk/OezoRkSJWstsYQkdfQGrNUmJzbyOTiHkdSXqR9rwPZZC19vcAxpifAeuB71hrn+2SZCIivUDJLoeC6xJ/6QGic24lMvEKnEDE61jSC7RnBiC15Q9r7UZgswZ/EZHOKxl5MKHjfkx63WdEZ99CprbG60jSC7SnAJQZY740xsw1xvwv2XMBduqiXCIivUrJiAMJj/8p6aoVRGffTCZe7XUkKXLtKQADgO8B84AdgE+BJcaYTcaY+caY+7sioIhIb+Efvj/h4y8mvWEl0dk3kY5t8jqSFLE2nwOQm/Z/NfcDbH3r397A/mS/L0BERDrBv+N+hE+4lNi8O4nNupnw5Ktxw+Vex5Ii1KkPo7bWJoCFuR8REckD/7C9CZ94GbHn7iA2c3q2BET6eR1Lioy+nFpEpAD5h+5JeMLlpKuriM6cTrpmg9eRpMioAIiIFCj/kN0JT7yCTHRjtgRUV3kdSYqICoCISAHzD96NyMQrycQ2ZUvA5nVeR5IioQIgIlLgfNvtQmTSVWRqa4jOvJH0pq+8jiRFQAVARKQH8A0aSWTy1WTq4tmZgK+/9DqS9HAqACIiPYRv4E5EJk+FZCI7E7BxtdeRpAdTARAR6UF8FTsSnjIV0imiM6eTWLvC60jSQ6kAiIj0ML4BOxCeMg3IsOrxn5Na/4XXkaQHUgEQEemBfP2HEplyDY7rJzbrJlJVn3sdSXoYFQARkR7K7bc9Q876BfhKiM66idS6z7yOJD2ICoCISA9WMmB7IlOm4ZSEiM66mdRXn3odSXoIFQARkR7OLR+UPRwQLCU6+xZSX37sdSTpAVQARESKgNtnYHYmINyH6JxbSa5Z6nUkKXAqACIiRcItq8jOBET6EZtzK8lVi72OJAVMBUBEpIi4pf2JTJmGW1ZBbO5tJFd+5HUkKVAqACIiRcaN9CM8eSpu+SBiz91O8osPvI4kBUgFQESkCLmRvoQnX43bdzCxeXeQ/Pw9ryNJgVEBEBEpUm64nMjkqbj9hhJ7/m6Sy9/1OpIUEBUAEZEi5oTKiEy+GrdiB2Iv3EPdsne8jiQFQgVARKTIOcFSIpOuwh24E/EX76Pu0395HUkKgN/rAI0ZY34JpKy11zdxXQD4DXAQEAPOsNbqfS4iIq1wAhEiE68kNvc24i/9CtIpSnY51OtY4qGCmQEwxvQ1xvwGuKKF1S4Baqy1ewCXAo92RzYRkWLgBMKEJ16Bb/CuxF95gLql872OJB4qmAIAnAQsBWa0sM4k4AkAa+3fgUpjzI7dkE1EpCg4JSHCJ16Ob/vdib/ya+rs615HEo8UzCEAa+3vAIwx17ew2hBgdb3Lq4FhQJu/B7Oioqwj8ZpVWdknr9srVNrP4qL9LC7t388+pM+8li+fuonYa7+hrLSE8v3Hd0m2fNLzmV/dXgCMMacCtzdavNhae1wbbu4CmXqXHSDdnvuvqqomnc60vmIbVFb2Ye3azXnZViHTfhYX7Wdx6cx++o7+Cb7kPaybcz+bv64hsNexeU6XP3o+2891nRZf9HZ7AbDWPgU81cGbfwFsD3ySuzwYWJWPXCIivY3jDxA+/mJiL9xL7ZuPQSZNYO/CnwmQ/CikcwDaYg5wNoAx5gggbq1t8/S/iIg05PhKCI//Kf6dDqR2/hMk3n/O60jSTQq+ABhjLjTG/CJ38W4gaIz5ELgLOMu7ZCIixcHx+QkddxH+kaOpffsP1C6a7XUk6QYFcxLgFo3f/2+tvb/e33HgnO7OJCJS7BzXT+iYC4k7PhL/fArSKYIHfMvrWNKFCq4AiIiINxzXR+jo84m7LokFz0I6ReDA/8JxHK+jSRdQARARka0c1yU07kfEHZfEwr9mS8DoU1QCipAKgIiINJAtAT+k1vWRWDSLTDpF8JDvqgQUGRUAERHZhuO4BMeeA66PuvfnZs8JOOx0lYAiogIgIiJNchyX4JizsiXgg+chkyJ4+JkqAUVCBUBERJrlOA7Bw87IzQQ8l50JOOJsHKfg30UurVABEBGRFjmOQ/CQ7+G4PhKLZkM6TfDIc1UCejgVABERaZXjOARGfwdcH4mFfyOTSRE68jwcVyWgp1IBEBGRNnEch+BBJ4PjI/HOn4mn04SO+hGO6/M6mnSACoCIiLRL8MCTwHVJ/OsZ4ukUoWPOx3E1nPQ0esZERKTdgvtPwXF91P7jT8QzaULHXIjj05DSk+jgjYiIdEhgv4kEDz2d5LIFxF+8l0yqzutI0g4qACIi0mGBfU8gePiZJJe/S+yFe8gkE15HkjZSARARkU4J7H0cwSPOIfX5e8Sev0sloIdQARARkU4L7Hk0wSN/QOqLD4nNu5NMstbrSNIKFQAREcmLwO7jCB11HqmVHxGbezuZOpWAQqYCICIieVOy2xGEjjmf1BpLbO4MMomY15GkGSoAIiKSVyW7HEbomAtJffkxUZWAgqUCICIieVey8yGEjr2I9FfLiM6+hUxtjdeRpBEVABER6RIlI0cTGv8T0lXLsyUgXu11JKlHBUBERLpMyU4HEB5/Men1XxCdfTPp+GavI0mOCoCIiHQp//BRhE+4hPTGVcRm3UQ6tsnrSIIKgIiIdAP/DvsSPuEy0l9/RWzWdNLRjV5H6vVUAEREpFv4h+1FeMJlpDevIzZzOumaDV5H6tVUAEREpNv4h+xBeMIVpKMbic6aTrp6vdeRei0VABER6Vb+7Q2RCVeQiX5NdOaNpKurvI7UK6kAiIhIt/MN3pXIpKvI1FZnS8DmtV5H6nVUAERExBO+QTsTmXQ1mUSM6N9uJL3pK68j9SoqACIi4hlf5Qgik66GZCI7E7BxjdeReg0VABER8ZRv4HDCk6dCKkl01nRSG1d5HalXUAEQERHP+Sp2IDx5GmTSxGZOJ7V+pdeRip4KgIiIFATfgKGEp0wDxyU2azqpqhVeRypqKgAiIlIwfP2GEJkyDXz+7OGAdcu9jlS0VABERKSguH0HE5lyDY4/SHT2zaTWfuZ1pKKkAiAiIgXHLR+ULQGBMNHZNxFfucTrSEVHBUBERAqSW16ZLQHBMlY/+QtSa5Z6HamoqACIiEjBcssqiEy5Bn9ZP6JzZ5Bcbb2OVDRUAEREpKC5ZQPY/sxf4Jb2JzZ3BslV//E6UlFQARARkYLn7zOA8ORpuH0GEpt7O8kvPvQ6Uo+nAiAiIj2CG+mbLQF9BxGbdzvJFe97HalHUwEQEZEeww2XE548FbffEGLz7iK5fJHXkXosFQAREelR3FAfIpOuxh0wjNgLd1P32UKvI/VIKgAiItLjOKEyIpOuwq0YTvyFe6n79F9eR+pxVABERKRHcoKlRCZdiTtoBPGXfkXdJ//wOlKPogIgIiI9lhOIEJlwBb7tdiH+8v3ULZ3vdaQeQwVARER6NCcQJjzhCnyDDfFXfk3dkje8jtQjqACIiEiP55QECU+4DN/QPYm/+hsSi1/zOlLBUwEQEZGi4PiDhE/4Gb4d9qb274+Q+OhlryMVNL/XAZpijPklkLLWXt/EdcOBD4BPcou+tNae0I3xRESkQDn+AOHxFxN78V5q3/gdpNME9j7O61gFqaAKgDGmL3AbcDpwczOrHQQ8aa29oNuCiYhIj7GlBMRfvJfa+Y9DOkVgX71ObKzQDgGcBCwFZrSwzmhgb2PMImPMy8aYfbonmoiI9BSOz09o/E/wjziI2rd/T+K9OV5HKjhOJpPxOsM2jDHXAzRzCOB64EvgAeBE4G5gD2ttopXN7gQsy2NMEREpcJlUkq/+eic1/5lP/6POoP+YU7yO5IURwGeNF3pyCMAYcypwe6PFi621rR6oaVQK5hhjbgT2AN5ry31XVVWTTuen9FRW9mHt2s152VYh034WF+1ncdF+ts4Zcx7+ujQbXn2Sms0xggeelOd0+ZPP59N1HSoqypq93pMCYK19CniqI7c1xlxM9hyAqtwiB6jLVzYRESkujusjdNT5xF0fiXf+DOkkgYNOxnEcr6N5qqBOAmyjcUAYuNkYMw7wAYu9jSQiIoXMcV1C486j1vGReHdm9sTAg0/t1SWgRxQAY8yFwBBr7c+BnwGPGmPOBmLA6dbatKcBRUSk4DmOS/DIc8H1kXhvDpl0iuChp/XaElCQBaDxyX/W2vvr/b0SGN/dmUREpOdzHJfgEWeD61L373mQSRM87IxeWQIKsgCIiIh0FcdxCB5+Jrj+bAlIpwiOORPHKbR3xnctFQAglUqyYcNaksnW3knY0FdfuaTTxX/0wev99PsD9O9fic+nf64ikh+O42Sn/3OHA0gnCY49t1eVAP0fFdiwYS2hUITS0sHtmgby+12SyeIvAF7uZyaToaZmExs2rGXgwO09ySAixclxHAIHnwqOS2LRLDLpNKEjf4jj9o4SoAIAJJOJdg/+0j0cx6G0tJzq6o1eRxGRIuQ4DoHRp2RPDFz4V+KZNKFxP+oVJUAFIEeDf+HScyMiXclxHIIHfTtbAhY8SzydInT0+Tiuz+toXUoFQEREBAge8K1sCfjnU9kScOyFOG7xDpPFP8chIiLSRsFRkwgeehrJZQuIv/grMqmk15G6jAqAiIhIPYF9TyR4+PdJfvYOsRfuIZMqzk+bL965DSkI7723iDlz/kZdXR1lZWVcfvlUryOJiLQqsPd4cH3UvvE7Ys/fTXj8T3H8Aa9j5ZUKQAGaMeMm/v3v90gm6/jiixXstNNIAE499TQmTfpWq7dfvPgj/vKXZ5g27dpOrdMWixYt5KqrLmXo0GHE43EqKyu57robGDhwIAD77TeK/fYbBcC0aZcTjUaJRCKduk8Rke4Q2PMYcFxqX/8tsXl3Ej7hEhx/0OtYeaMC0EFvfbiGZ//+KVVfx6koD3LyuJ05bK/Bedn2FVdkXyWvXr2Kiy++gEcffbJdt9999z2ZNm3PTq/TFkuWLGbMmLFcf/0NZDIZLr30xzz99B+48MKfNlhv/vw3GD58hAZ/EelRAnscheP6iL/2MLHn7iB8wqU4JcVRAlQAOuCtD9fw27mLSeQ+HKdqUy2/nZv9QsJ8lYCmLFy4gF/96i5SqTQjR+7MtGnXMmPGdD799BPWr1/PLrvswvXX38CHH37Aww8/yA9/eD6PPfYIoVCIzz5bxs4778J1191ASUkJCxcuaHWd+++/h1dffYl+/foxYMBAjjjiSCZOnNIgk7WLGTEiO0PhOA5DhgyjpKSkwTpz5sxk9epVXHTRxV322IiIdJUSMxYcl/hrDxF77jbCJ16GUxLyOlanqQA08ua/V/PG+6tbXOeTVV+TTGUaLEsk0zwy5z/8fdGqZm93xL7bM2afzn2a3YoVn/P007MoKytj0aKF+P0lPPDAI6TTaS655ELeeutNysv7bl3/gw/e54knnmbgwEouuOBc/vGPtzjiiCMbbLOpdQDef38Rjz32J+rqajnnnDO2uR1kZwDGjTsGgGXLPmXFiuWcf/6Pt17/5puv8+tf/4rDDz+CW275P370o4vo379/px4DEZHuVrLbGHB9xF95kNicGYQnXI4TCHsdq1NUADqg8eDf2vJ82mGH4ZSVlQEwatQBlJf35Zln/sTnn3/GF1+sIBaLNSgAI0bszKBB2wEwfPgINm/etM02m1rH2v9wzDHHUVJSQjgcZOzYcdvcrrY2zuefL+fBB+/lvvvu5KuvvmLGjLsaDPBjxoxlzJixeX0MRES8ULLLoeC6xF96gOicW4lMvAIn0HMPa6oANDJmn9ZfpV9135tUbardZnlFeZCp3z+gq6IBEAx+c+zpjTde46GHHuDUU09j4sRvsXHjRjKZhiUkEPjmrFXHcba5vrl1XNclnW650CxdupQ+ffrw+ONPAXD33bfx8MMPcvfdD3Ro30RECl3JyIPB8RF/6T6is28hMvFKnGCp17E6RJ8D0AEnj9uZgL/hQxfwu5w8buduzbFgwT855pjjmDTpW5SVlfHuu++QTqfysu2DDjqE1157mbq6Ompqqpk//41tPpJ3yZLF7LHHXlsvn376Wbz//iI2bNiQlwwiIoWoZMSBhMf/lHTVCqKzbiYTr/Y6UoeoAHTAYXsN5pwJu1PRN3sSSEV5kHMm7N6lJwA2ZcqUb/Pii/M4++zvce2109hnn31Ztar5cxDa4/DDj2C//fbnBz/4PpdffgkDB1YSCDQ883XpUtugAAwcWMlee+3Dm2++lpcMIiKFyj98f8LHX0J640qis24iHdv28Gqhc5qaEi5SOwHLqqqqt5naXrNmOYMHD2/3Bov564A/+OB9Vqz4nAkTJgMpzjvvXK655ufsssuunuTp6HPUHpWVfVi7dnOX3kch0H4WF+2nt5JffEBs3p245YMIT7oaN9K39Ru1IJ/76boOFRVlACOAz7a5Pi/3IkVnxx2H88IL8zjnnNM555zvc+yxx3s2+IuIFCr/sL0Jn3gZ6U1ric2aTjrac766XCcBSpPKy/ty2213A8U90yEi0ln+oXsSnnA5seduJzpzOpHJU3FLC//tzpoBEBER6ST/kN0JT7yCTHQj0Zk3kq6u8jpSq1QARERE8sA/eDciE68kE9tMdOZ00pvXeh2pRSoAIiIieeLbbhcik64iU1uTLQGbvvI6UrNUAERERPLIN2gkkclXk6mLZ0vA12u8jtQkFQAREZE88w3cicjkqZCqy5aAjS1/x4wXVABERES6gK9iR8KTp0ImTXTmjaQ2rPQ6UgMqACIiIl3EN2AY4cnTAIfYzOmk1q/wOtJWKgAiIiJdyNd/CJEp08D1EZt5E6l1y72OBKgAiIiIdDm33/ZEplwD/gDR2TeTWveZ15H0SYDirffeW8ScOX+jrq6OsrIyLr98qteRRES6hNt3OyJTphGddRPRWTcTmXglvkEjvcvj2T1Lsy666DxefHFeg2WxWIyJE49l48amP2d64cIF/PSn57N48UdMn/7LJte54YbrmTNnZrP3W11dzTXXXAnQ4nbaa9GihYwffyTnnnsGp512MhdffAHr1q0DYL/9RnHNNT/n5z//JV999SXRaDQv9ykiUojc8kFEplyDEywlOvsWUl9+7F0Wz+5ZmjVp0rd4/vnnGix77bWXOeCAg+jXr1+Lt9199z2ZNu3aDt3v5s2bWLrUdno7jS1ZspgxY8by6KNP8vvfP4Prujz99B8arDN//hsMHz6CSCSSl/sUESlUbp+BRKZMwwmXE51zK8k1SzzJoUMAHZRYOp+afz1DuroKp6yCwOhTCOx6eF62fcwx47n33jvZtOlrysuzXy05b94cvvvdM0gmk8yYMZ1PP/2E9evXs8suu3D99Tdsve3ChQt4+OEHueeeB8lkMtxzz+28+eYbDBw4kHQ6zf77H9jsNu644xbWrVvLNddcyamnnrZ1OwC/+93DPP/8XFzXZfToQ/nxjy/hvffe5bHHHiEUCvHZZ8vYeedduO66GygpKWmwP9YuZsSI7DSX4zgMGTKswTpz5sxk9epVXHTRxXl5/ERECp1bVrH1cEBszgxK9j6e5Mfz2Vy9HqdsQF7HlGYzdOnWi1Ri6XxqX39065c9ZKqrqH39URJL5+dl+5FIhLFjx/Hyyy8CsG7dWj7/fDkHH3woH3zwPn5/CQ888Ah//OOf2bx5M2+99WaT23n11ZdYssTy+ON/4pe/vImVK7NvP2luG5deehUDB1Zy4423NtjO/Plv8sYbf+ehhx7j4YefYOXKFfzlL89s3dZll13NE088zZdfruEf/3hrmxxLlixmxIidAVi27FNWrFjOySd/F4A333ydX//6V6xfX8Utt/wfGzZsyMtjKCJS6NzS/tl3B5SEqVs0k0x1FZDJ+5jSHM0ANFK35E3q7N9bXCf15SeQTjZcmExQ+9rDJBe/1uztSsyRlOw2pk05Jk6cwkMP3c9//dcpPP/8XE44YSI+n49Row6gvLwvzzzzJz7//DO++GIFsVhs60xBfe+++w7jxh2N3++nf//+HHpo9r6b20ZzFiz4J8cddwKhUAjIHqKYO3c2I0aMZMSInRk0aDsAhg8fwebNmxrctrY2zuefL+fBB+/lvvvu5KuvvmLGjLvo3z/7VZljxoxlzJixbXpMRESKjRvph+M6ZBpfkUyQ+NczXToLoBmAjmg8+Le2vANGjTqAqqp1fPnlGubNm8ukSd8C4I03XuMXv7iWUCjExInfYr/99ieT2eafDpCdbq9/lc/na/c2ANLpdIPLmQykUtl9DQQCje6v4XaWLl1Knz59ePzxp/jDH/7Mt799Cg8//GDbHwgRkSKXqWl65jPTxV8prBmARkp2G9Pqq/TqJ69o8olxyiqy7/PMkxNPnMTvfvcw5eXlDB06DMi+Gj/mmOOYNOlbrFz5Be+++w6jRx/c5O0POuhgnnzyMU466WTi8Tj/+Mdb7L33vs1uw+fzkUqlmtjOaB5++CFOOunb+Hx+5sz5GwcccFCb9mHJksXsscdeWy+ffvpZnHLKZDZs2LB1FkBEpDdzyiqaHVO6kmYAOiAw+hTwBxou9Aeyy/No4sQpzJr1162v/gGmTPk2L744j7PP/h7XXjuNffbZl1WrVjV5+7Fjj2L//Q/k7LO/x7Rpl7PTTiNb3MaAARVst91gLr74ggbbOeKIIzn88CM477yzOeus77LddoM55ZTvtWkfli61DQrAwP+/vXsPs6oq4zj+ZWYEhS6iYKkJ3vBFScIoCdJETSsfL5Ua1UOKiVFqWYZ2VdMSy1uPUZl5yUuZ5a2M1FDUSlDzsSQ1/GkmPpaX0EwFiYvQH2sdPRwZhhnPZuPZv8/z8DBz9F6SJgAAC59JREFU9t5rv/ucmVnvXmvttQYMZNiw7Zk5s/OuEjOzKllTdUqjXqtq+m0xmwMPP/30fJYtW/Gan3jiEd785sHdKmzxg7NYUtBTAGubjo42li5d1vWOBerJZ9RdAwe+nnnzni/0HGsDX2dr8XW2hsUPzmLxnVeyvIlPAbS19WLDDV8HsAUwt3G7uwB6qPeQMfTddqfSK0YzM3vt6z1kDL2HjFmjiY67AMzMzCrICYCZmVkFOQEwMzOrICcAWYUGQ77m+LMxM2s+JwBAR0dvFix4zhXNWmj58uUsWPAcHY2PyJiZ2avipwCA/v0H8swz85g/f+VL7Xamra3tFbPktaKyr7Ojozf9+w8s7fxmZq3ICQDQ3t7BgAEbd/u4Vn8utaYq12lmViXuAjAzM6sgJwBmZmYVVKUugHZIUyM2U7PLW1v5OluLr7O1+DpbS7Ous66c9pVtr9JaADsBfyw7CDMzszVsZ+DWxherlAD0Ad4JPA68cs1bMzOz1tIObAzcCSxq3FilBMDMzMwyDwI0MzOrICcAZmZmFeQEwMzMrIKcAJiZmVWQEwAzM7MKcgJgZmZWQU4AzMzMKsgJgJmZWQVVaS2ApoqINwCzgL0lzS05nMJExAnAR/K3v5V0bJnxFCUiTgIOAJYD50s6s+SQChMRpwMDJE0oO5aiRMTNwEbAkvzSJEl3lBhSISJiH+AEoB8wXdJRJYfUdBExETiy7qUtgEskHdnJIa9ZETEe+Er+9jpJk4s8n2cC7IGIGAWcCwwFtmnVBCAi3gucCOxKqhivB74v6epSA2uyiNgFOBkYC6wD/A14vySVGVcRImJ34DJSMjeh5HAKERG9gH8CgyUtLTueokTElqT1TUYBTwI3AVMkXVdqYAWKiGHAr4DRkp4qO55mioi+pJ/bbYD/AjOBr0m6sahzugugZw4DjgAeKzuQgj0OfFHSYklLgDnAoJJjajpJvwd2zZXFRqSWsQXlRtV8EbEBKdGZUnYsBYv8//SImB0RLXenmH0I+IWkf+bfz3FAy7VyNDgb+GqrVf5ZO6lO7ke6EVkHWFjkCZ0A9ICkiZJafmVBSfdJuh0gIoaQugKuLTeqYkhaEhEnku7+ZwD/KjmkIpwDfA14puxACtaf9Bl+CNgd+HRE7FFuSIXYGmiPiGsi4m7gcFr4s80tkutJurzsWIog6XngOOB+UkvAXFI3c2GcAFiXcrPbDcAxkh4sO56iSDoBGAhsRmrlaRm5H/VRSTPKjqVokm6TdJCkZ/Od4vnAXmXHVYAO4L3AocBoUlfAwaVGVKxJQCuPzRkOfBIYDGxCWrW20DEATgBslSLi3aS7qS9LuqjseIoQEUMjYgSApBeAq4Dh5UbVdOOAPfOd4knAvhHx3ZJjKkRE7JTHOtT04uXBgK3kCeBGSfMkLQSuBnYsOaZCRERvYBfgmrJjKdD7gBmS/i1pEXAhaVxSYfwUgHUqIjYjDbgZJ+mmsuMp0JbAiRGxE2mw437ABeWG1FySXmoCj4gJwFhJXygvokKtD5wUEWNI/agHA58uN6RCTAMuioj1geeBD5B+X1vRcOABSS03NqfObODUiOgHvADsA9xZ5AndAmCrMhlYFzgzIu7O/1ruD6mka4HfAn8B7gJmSbqs3KispyRNY8XP8wJJt5UbVfPlxxpPBW4ljV15BPhJqUEVZ0tSv3jLkjQd+DnpZ/avpOT120We048BmpmZVZBbAMzMzCrICYCZmVkFOQEwMzOrICcAZmZmFeQEwMzMrIKcAJg1iIjNI2J5RBza8PrkiLiwieeZGxHvaFZ5XZzrDRExMyLui4gPd7HvvRExdk3E1QwRcXZEPBwRJzehrLERcW8z4urh+feNiO91sc/mETF/TcVkrcsTAZmt3DLgjIi4tUVWBRwBvEnS1mUHUoBJwCBJr/nnxCVdQ2vPdmdrEScAZiu3EDgDuDQiRktaXL8xtwTcK+n0xu8jYi5wKbAbaWGaU4F3AyNJU9LuK6m2kuQREfE2oA9whqQLcnn7AF8HepNmBZss6baI+AZp3vdNgNmSxjfE9UHS+vBtpNnhjgaeJc1suGmeCnh0njq2dsx2eXtf0kIk/eq2jQG+k197EThR0rSIaAdOA/bN5d8BbCdpbETcAvyHtFz22cDFwFnA9qTJTWaQ1pVYGhHb5m0bklZD+17tPWi4rmHA9/N+y/N7dXFE/JE01e91EXF4/SJd+b3amrS2w8bA3cBESc91Vl7dsf1IE8+MkvRAfu1GYCppkaHn8vVsRpq05SBJ8yNi5/y+9AUWA1+XdH2efXH//LkMzmWfS1rnfhvgTEln5P0OkLR3RLyL9LPTJ8d/g6QVWqXMXg13AZh17mRgPj1bPnddSe8Cjgd+DJwl6W3Ao8CEuv0WSno7sAdwSkQMyysvTgH2krQD8CngqlwpQapAdlhJ5T8U+BGwfz7X8cCvScs6TwQekjSivvLPfgacK2k4qTIenMvrT5pZ7hM5xv2AsyNiUC5vJPBWUkKyVUOZz0jaTtJU4LvAXZJGAjsAA4CjI6IDuIK0zsRI0lzvk3PFV39dHaS74qk5xg8AU3JitnPebddOVujchbSK5VBgKXD8qsqrHZSnnL0oXycRsRWpop6WdxkJvB/YFtgcODAiNszXc1Qu92DgpxGxRT5mZ9KUxMNJicNHSasV7gV8KyIa/x4fBRwvaRSwHWn9hpEruUazHnECYNYJScuA8cAhPVhO9sr8/0PAE5Jm132/Qd1+5+RzPQZMJ1UIe5Du+GbkO/afkbokas33t0taupJz7kZaTOQfucybgH+TKquVypXWcNJdOpJmArU+8NE5jl/lOK4l3S0PJ1VaF0v6X24dOaeh6PrKeG9gUi7jLtKCNduTKtStgAvytt8D65GShHrbkBKqq+reqytJFXBXLpf0ZP4szyctuLK65f0QOCgi1iElYedJejFvu17SIklLgHtIn+ko4O95il4k3QfM5OUFXe6U9GiO5WFgev76IdKU230bzn8wsH5EfDXHsh7wutW4ZrPV4i4As1WQ9GhETCLdDV5ct2k5qem5pnfDoYvqvl7VSnQv1n3dlvftIFXk42ob8sJMj5GanzsbANae46rXRmp2X/zK3VdQfy215KIdmJPvQGtxbALMAw5pOKb+OmiIsR04UNKcXMb6Oc5BwLOSRtSV/yZSl8LqXldX6hOlthznapUn6YGI+Cup5ePjpAq+pr4Vpfaz0NX7v6hhW1crFP6B1L1wPfDLfP5eqzzCrBvcAmDWBUlXANcBn697eR7wDnipUtylh8VPyGUMIq3tPiP/2zM36RMRe5EqgvW6KGsG8L6I2DIftxupqfmOzg6Q9DTprrzW1P120t05wO3AkIh4T942AngQ2JS02M74iOiTm9Qn8MrKr+Z3wBcioldE9CE1vx8JCFgYEeNz+ZuRWh8aWyzuB5bUnl7I7/f+wA1dvB8A+0XEG3Pz+mHAb7pZ3g9Iffp/qhu30ZnbgKERsWMudxjwHuCW1YhzBTlJeifwpdxS8RZSC1B7d8sy64wTALPV8znSams1U4GNI0KkfvKeLpe8bkT8mdS8/llJD0j6G6nJ+bKImA18kzRwcJWPfuXjDieNF7iXtJLYPpIa76gbfQz4aETcAxwHzMnlzSNVjKflOC4hjQeYS1qr/A7SinuzSHe4L3RS/udIgwjvISUy9wCn5q6D/YCJ+U57OnBc7oaov64lwAeBo/J+NwInSbq5i+sCeJL03s4htSxM6WZ500jN7j/q6kSSngIOBKbm9/JS4JDaIMLukPRf4BTgz/mz/DKpO6EVn+Kwkng1QDPrtojYE9hI0k/z92cB/5P0pXIje1l+CmCApCNfRRmjgfOAt0ryH0trKR4DYGY9cR9wTEQcS2qWng18ptyQmisiLiIN4Bvnyt9akVsAzMzMKshjAMzMzCrICYCZmVkFOQEwMzOrICcAZmZmFeQEwMzMrIL+DxCgaABFwt9iAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.metrics import r2_score\n", "\n", "\n", "x = [1,2,3,N]\n", "models = [model_1, model_2, model_3, model_N]\n", "X_trains = [X_train_df, X_train_df_2, X_train_df_3, X_train_df_N]\n", "X_vals = [X_val_df, X_val_df_2, X_val_df_3, X_val_df_N]\n", "\n", "r2_train = []\n", "r2_val = []\n", "\n", "for i,model in enumerate(models):\n", " y_pred_tra = model.predict(sm.add_constant(X_trains[i]))\n", " y_pred_val = model.predict(sm.add_constant(X_vals[i]))\n", " r2_train.append(r2_score(y_train, y_pred_tra))\n", " r2_val.append(r2_score(y_val, y_pred_val))\n", " \n", "fig, ax = plt.subplots(figsize=(8,6))\n", "ax.plot(x, r2_train, 'o-', label=r'Training $R^2$')\n", "ax.plot(x, r2_val, 'o-', label=r'Validation $R^2$')\n", "ax.set_xlabel('Number of degree of polynomial')\n", "ax.set_ylabel(r'$R^2$ score')\n", "ax.set_title(r'$R^2$ score vs polynomial degree')\n", "ax.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We notice a big difference between training and validation R^2 scores: seems like we are overfitting. **Introducing: regularization.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What about Multicollinearity?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's seemingly a lot of multicollinearity in the data. Take a look at this warning that we got when showing our summary for our polynomial models: \n", "\n", "\n", "\n", "What is [multicollinearity](https://en.wikipedia.org/wiki/Multicollinearity)? Why do we have it in our dataset? Why is this a problem? \n", "\n", "Does regularization help solve the issue of multicollinearity? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What does Regularization help with?\n", "\n", "We have some pretty large and extreme coefficient values in our most recent models. These coefficient values also have very high variance. We can also clearly see some overfitting to the training set. In order to reduce the coefficients of our parameters, we can introduce a penalty term that penalizes some of these extreme coefficient values. Specifically, regularization helps us: \n", "\n", "1. Avoid overfitting. Reduce features that have weak predictive power.\n", "2. Discourage the use of a model that is too complex\n", "\n", "\n", "\n", "### Big Idea: Reduce Variance by Increasing Bias\n", "\n", "Image Source: [here](https://www.cse.wustl.edu/~m.neumann/sp2016/cse517/lecturenotes/lecturenote12.html)\n", "\n", "\n", "\n", "## Ridge Regression\n", "\n", "Ridge Regression is one such form of regularization. In practice, the ridge estimator reduces the complexity of the model by shrinking the coefficients, but it doesn’t nullify them. We control the amount of regularization using a parameter $\\lambda$. **NOTE**: sklearn's [ridge regression package](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) represents this $\\lambda$ using a parameter alpha. In Ridge Regression, the penalty term is proportional to the L2-norm of the coefficients. \n", "\n", "\n", "\n", "## Lasso Regression\n", "\n", "Lasso Regression is another form of regularization. Again, we control the amount of regularization using a parameter $\\lambda$. **NOTE**: sklearn's [lasso regression package](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) represents this $\\lambda$ using a parameter alpha. In Lasso Regression, the penalty term is proportional to the L1-norm of the coefficients. \n", "\n", "\n", "\n", "### Some Differences between Ridge and Lasso Regression\n", "\n", "1. Since Lasso regression tend to produce zero estimates for a number of model parameters - we say that Lasso solutions are **sparse** - we consider to be a method for variable selection.\n", "2. In Ridge Regression, the penalty term is proportional to the L2-norm of the coefficients whereas in Lasso Regression, the penalty term is proportional to the L1-norm of the coefficients.\n", "3. Ridge Regression has a closed form solution! Lasso Regression does not. We often have to solve this iteratively. In the sklearn package for Lasso regression, there is a parameter called `max_iter` that determines how many iterations we perform. \n", "\n", "### Why Standardizing Variables was not a waste of time\n", "\n", "Lasso regression puts constraints on the size of the coefficients associated to each variable. However, this value will depend on the magnitude of each variable. It is therefore necessary to standardize the variables. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's use Ridge and Lasso to regularize our degree N polynomial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise**: Play around with different values of alpha. Notice the new $R^2$ value and also the range of values that the predictors take in the plot." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R squared score for our original OLS model: -1.8608470610311345\n", "R squared score for Ridge with alpha=100: 0.5869651490827923\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABB4AAAHwCAYAAAAB0KxmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzdeZhkVX3/8Xf3MM5Az6AwdgKIgBtfV0RlMW4oEgwGRaO4oAE0gEZQo6IxIi64K+K+sogRUSOIQQFBEXfBFUXRr8sPUMIEJ4PLLPbQMz2/P85tKHqqV+r0UvV+Pc8803Xr3nO/99atrtOfOvfevs2bNyNJkiRJklRD/1wXIEmSJEmSupfBgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkaraa6wJUV0RsBgYz8/9aph0JPDUzD46Ik4DfZOZ/TtDGa4CfZOZ/Vy+4wyJiT+Bc4E/AUzLz2srr+yJwTmaeGRFXAo/OzD+NM+8dgfMyc/+aNbVZ75E0r3/l9RwInArcCOyXmX+dYTsDwOuBJwIbgM3AF4A3tmszInaivAYPm6TdC4HjM/PqGda1xXurhog4Hrh/Zh5Zo56IeCJwQGa+KCL+Edg3M18z1eMkIh4NfBV4XGZ+uWX6+4H/y8zXTbL8fYGPAssor+0rM/Pi6WyDpNln/8L+RZv1HskC6V80n10XAdkyeTlwNXBkZq6OiNOAT2fmV8YsuxfltdhtZluwRS1nAkcA+2fmZS3TdwP+H/DBzDxuGu1Nqb6I+Brw/sw8Z5r13tLPioi7ASdn5lOaen+Wmcum096Ytl8H3Hmi7W1eu/dn5v1nup5x2u0DzgSuysyTm2mLgHcC/0D5u/nkzPxw89y9gNOBOwNrgcMz85edrKnbGDz0uMx8zRRm25/yi3gheiJwWWYeNdsrzsw9J5llO2Cf2ahljjwDODUz3zjTBiJiK+ArwHeBPTNzfURsA7wFuDgi9s/Mja3LZOYNwIShQzPf42daVzfJzPOB85uHewPbz6CZm4GPR8QeMwhiPgickZlnRMSDgK9FxIqxr6ukhcX+RT32L25//6Lx29Z92fyReS5wPPAfs/za/g74Z+CylmmHA3+YxRqmZEw/a1cg5rCcjoiI+wAfAPYFrmp56nnA7sD9KcHUdyPiR5n5PeCTwLsz8+yIOAg4JyIekJmbZ7n8BcPgocc1KevPMvPkiHg98GTKHxGrgSOBfwL2At4REZso32x+ANiT8u3kRcCrMnNjRDweeBuwCbgSOAB4BPBo4F+AAeDPwMHAh4B7ASuANcBhmZlN+vpD4KHA31C+Cd0B2K9Z/mmZ2foLYXQ7TgSeCWwEfgUcBzwWeAGwKCK2zsxnjVlmI/BW4KCm7Vdl5ueaxP6WejPzMRHxL01b/c2+OS4zf9mkvh8HdgKua2oebf+Wb4Mi4j8oafZG4NfNvv0YsHXzzcVDKL/E3wFs07wGr87ML7Wrp2UdxwBPyMwnNI/vDVwK7NKs73nAHSh/TL41Mz80Zh98jZa0u/Vx80v4Pc1rtAh4b/PH4bKm9nsBI83r9bzMHGlp9+XAk4C/Nt+8vAo4pXlNNgFXAC/JzDURcW3zeI/mNTivpcRDgf7MfOnohCZ8+Dfgx8CTI+L7wDeBXwC7Ndv95cxc1oQUH6YcT3+i6eBm5pHNep9K+ab9TZRvFe4PLG6259sRsTvleF8O7Eg5rp+emUOMIyKGmm09oGn7dc12PAC4oXm91kXEI2n/ei8G3gv8PaXDcSPlfTP6LdZ7mrYWU17rl4/3R3pEvBtYk5knRsSOzfr3z8zLIuLZwBMo7+GnAm8Ank95v/yZcpzuGBEXUI6njZT36S/arOo3lNfwY02bY+t4IvD8ccKeRZROMpT9PO6+lbRw2L+wf8H87l+0sy0wCHy7Tc3/CryEcpzdcpxM0s+4C/D+Zp8tpoyeePM46/408C/N8TQ6guPpwH/RnBofETtTju/dgD7g45n5jua5tvU1z50APKVp51rgBU14sIWIeBLwssx8ZPM4m7pf26z/e5T33k+BOwKnAXeJiIspx8SiiPgwJfi6I/CKzDy3zXpeBRwCbE05/o4f+/o0r9+nKP2hOwHvbDnOlkXEp4F7A0uBozPzmxP125pRWOOFosc22/K7MdOfDHy06Wf9sVnnsyPif5p1f7pp86KI+BDwIOBH7fatvMZDr7gsIq4c/QecNHaGiLgr8G/A3pm5F3AJZcj1B4AfUP64OY/yB9Fqyh8+ewEPBI6PiBXAJ4BnN+nxZcBdWlZxP8qwwMdQPoj/lJl/l5m7A9+nfJCP2i0zHw48G3g78LWmpi8BL2xT+3OaNvfOzD2AnwFnZuYnKR8GnxnbKWgsAtZn5kOApwFnRMTg2HojYj/Kh+wjM/NBTU2jvxw/AFyemfcDXkT5JTS2vidSOgJ/1wwLu6bZ3ucAf232152Ac4AXN9twBHBWM4Rt7P5r9SngERGxQ/P4OTQdDuBo4PFNzU9v6p6SZqTBOZRh7w+hdMyOj4iHUn4JL2/q3rtZ5O6tyzcfhOcD78rMlwOvpnSeHtj866d0gkb9LDPv06ZT8DDgG2Pra9LkSykffgA7A29ojqeVLbOeSAlY703pqD5onE3el/KB9iDK/hvtGBxN+WB/KHBP4G7AP47TxqglwP9m5j6UTuNplPfWfSkfwoc075fxXu8XUNL1+1I+bHdpaftdwA+b1+RBlOF9L2V8n6O8N6AME/zfpk0o39bd0hnIzCu49f1yQjP57k2ND6C8DsdPsK4XAbtHxBbDIzPz/AlGmBwL/EdEXE8Z3fKvjnaQFgz7F/YvFmr/AuAezbH784j4A6VfcT7w7jE170n5EuFRmbk3JbwZNVE/4xOUEX0PofwhfkBEPG2cXbOKMrrzic06H0H5QuWmlnk+SRll8wDg4ZQ/gJ8xUX0RcTjlPbVPs18vpPRLxnMxsEdE3CnKqRPbctt+w+cpoRCZuQk4ijJy5HHNPEspX/48mNJn2OLYiIhdKfvq0c0xeQJtfnc0tqccC48GToqIBzTTd6YcA3sCH2m2Hybot2Xma8YJHcjM4zLz7DZP3RX4fcvj65t13xW4oTUUa3lO4zB46A2Pycw9R/8B7d50/wP8BPhRRJwMXJmZn28z30GU9HdzZm6gfPAeBDwKuDozfwKQmR8H/tKy3E8z8y/Nc+cAZ0bECyPiPZRfJq3ng32u+f+3zf9fanncbhj4QcDHMnNd8/g9wGMj4g7tdsYY729q+iklIX7U2Hopv7DuCXyn6Vi9HdguIran/OI8s2njN5RvbMY6APhsZv6xme+lmfmmMfPsSzkX9opmnp9TEvdHt6nnFpm5hrK/nh1liOCzgNMzcy3lm59/jIg3UH6pT+ecu92Be1A6S1cCX6d0Nh4EfAu4X/NNwCspw8x+M0l7BwEfzszh5pf0+7j1D2IoIxbGs3ic6Uso34pB+abnu23meTxlf4w0++/j47R1XWZe2fz8I249zv4dWBURr6B8y7ATU9uPo3/Q/5ZyruD/NNt9TdP2RK/3AcDZmXlzc0x/sqXdg4HnNa/JDykdmQcwvm8BO0fE31KChzcCf9+8N/ajdEAm8r2W1/ZKWr5xG6up9ZnAmyJiSuddRsRS4DOU82l3prz/PtL8oSJp/rN/MT77F+3Np/7Fb5tj936Uz/sdKftzeMx8jwUuycz/bR5/tOW5tv2MKNen2g94Q7Odl1O+SJjoNJn/pIRiUAKiM0efaNp7OCWQIjP/3Dx/0CT1HUwZjfGDpo4XMsGpEVlGW3yFEjYcRPmj/m7N6JJDaPnCYhw3560jHNr2GzLzOsppJM+KiLdSRluOdwx9oPmdcD3l/XpgM/23o8f0mPXMtN82nn5u7WtCGWmyqc301uc0DoMHAdD8st6PkpyvBt4VEe0S7LFvtH7KH4YbKW+4Vq0p4NrRH5rhYKcD64GzKal667IbxtQ29gNgrEVtatqqTT3ttH6z2s+tvzDWtkxfBHyipWP1YMq3MX9s1tu6nnbf1G5sra8lRZ5oG0brGf2jey3jO5XyC/wfgF9k5jXNcLgrKefefYvyjUA7Y+sf7Uwtogy7bO1QPpTSAbuG0lF6CyUJ/0pEbDG8fpLta922ibbv28CjIuI2v6uax48CvtNM2pDtvyUfe1yO94HQemGq1n3yKeAYyjDXd1FCiakcV63HcLvjd7LXe7xjahFwaMtrsi+3/TbvNpr39RcpHaN9KcfKjpRTP77TdCAn0lr72GOl3fp+RAk3PkX51mMy9we2ycwvNstfDvy8qVVSF7B/ccty9i9urWc+9C9uIzM/Rhnt8NlmVMZY470W4/UzFjXTHzZmO8c71YJm/fs24fujuDUYg7JdY4+7qfYb3tZSw16UAGMi51H6DQdSRkB8nXJ6y/2Br02y7KT9hoh4MOXLom0pI6De1m6+NtvS+j4abz0z7beN53eU8GLUTpSRDb+jnI7a1+Y5jcPgQQBExAMpQwh/kZlvobxZR4e5beTWX2wXA8dFRF9ELKG8ub9M+QNx94jYo2nvKZThfe0usPI4ylDF0ylXE34C5RfjTH0JeG6TBkMZkviN5huTyRze1PtgyjC5r7eZ52LgmVHOkYeSzF7asu5jmjZ2AcYOVYSSHP9TRGzbPH4dZXj8Rsq5cH2UX8D3joh9mrbuR/nQ+dpkG9D8sdZH+abp1GbyXpRhe2+k/FI/uGl37H5e1cw7eneBPUabpZw/+ezmubtSjo+HNB27j1HS9X9v9s+DJynzS8C/RsTiJjQ4lnLcTOYcYB3w7ojYuqlla8o3Gmu5dUjqeC4AnhMR/VHOwzyM9sfkeB4HnJSZn2ke78vtO1ZHTfR6XwQcHhFLmxEBT29Z7mLgJS3vv/OZIHhofA54BWXkxc2Ub83eQvtvLVrf6zN1MuWUjmdPNiPl2hB3jIiHAUTEPSinmPz4dtYgaZ6wf2H/Yp72L9r5d8oQ+mPHTL8EOLAJXaCEaKPa9jOa0Q+X05wOGRF3ohzLh4y38ua4Oo8y8uELrV+oNCNQLh+trRmBcDhlWyeq72LgqJZj5CTKKSAT+QJlFMWelGs6XEK5DtRFWU6vaDWTfsOjgB9k5incGmqM9z4dfR/tQglCLpqk7U732/6b8jtgq+Y1fAbw+WYExm9o+mgR8ThKILrFdWJ0K4MHAdAMYfwvylCsHwDP5dZzx88H3hIRR1A+dP+G8sa6ivIB8qbMvIkyzPo/I+JHlDf+Rsq3DmOdTBku/lPKELgfURLumTqd8uH7vYj4BeVDqt05l+08vKn3DMrFZ/44dobMHE1jv9zUfBjwT1muM3AscN9mvadTvgUYu/yFlA/Sb0fEVZSLWZ1AuRbB9yjf8G6mfAv9vmaes4HnZOavprgdp1LOgxwdvnoJJXVNyjmCu1A6AWP38xspH1Y/o3wYfaOp+WbKh+NRzTZfApyYmd+mfCAuAq6OiB9Srlvw3knqeyPlj9Erm3oWAy+ebKOaD90DKSHDD5s6f9Q8/vspfFv1FsrFCq+iHCN/oP0xOZ5XAec1r8lHKB+Qt+dYBSDLnR/Ge70/Qjnv+WfN+q5pWfRFlIswXUW5sNNVTH5u7VcoKfxoR+xi4G8pHYuxvgo8LiLeN4PNAm65/sbhlIu6AeU85Ci3Lx07758o5/S+p9kP5wDHZOZvx84raWGyf2H/gnnYv2in+Uz6d+D1UU5RHJ1+FSXAv7Q5hltH9E3UzzgMeGiz368APpXl+iAT+U/KaTBntnnuWZRTfa6ivL6fowRtE9V3GmXk4+UR8XNKAHTkRAU0p3H8AvhxEzRcTAlk2n1hcTUwFBHfY+ojCz4F3Lk5tq+m9Om2j4jlbea9W3MsfAl4UWZmm3lajdtvi4iTornA5DR8iHIq1k8o14w5PTNHQ8RnAs9vjvE3UUakjrRvRgB9mzd7xw/dfk2S+mrgdVnuOvBgSgq8U87T28pEm3uQq7tExDOAv2Tmhc03IedSvkn50CSLSpLmAfsXms/sZ9QTzd3HMvMHc1yKOsQRD+qIZkjZzcD3o1y85iOUW1PNy06BesbPgBOaY/JnlNtJTnQ1Z0nSPGL/QvOc/QxpihzxIEmSJEmSqnHEgyRJkiRJqqbd7WLmqyWUqyCvxHukSpLUahHlVqnfZ8wtA9Vx9kckSdrShH2RhRQ87E25QrEkSWrvkcC35rqILmd/RJKk8bXtiyyk4GElwB//uI6RkYV3XYoVK5axevXauS5j3nG/bMl90p77ZUvuk/Z6cb/09/ex3XYD0HxWqqqO90e68Zh1mxaGbtumbtsecJsWgm7bHpjZNk3WF1lIwcMmgJGRzQsyeAAWbN21uV+25D5pz/2yJfdJez28Xxz6X1+V/kg3HrNu08LQbdvUbdsDbtNC0G3bA7drm9r2Rby4pCRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUzVY1G4+IJwCvBQaASzLzxRFxAHAKsDXwmcx8dc0aJEmSJEnS3Kk24iEi7g58GHgSsAfw4Ig4CDgDOAS4D7B3M02SJEmSJHWhmqdaPJkyouH6zBwGng6sB36dmddk5kbgLODQijVIkiRJkqQ5VPNUi3sCN0fE+cAuwBeBnwMrW+ZZCew8nUZXrFjWsQJn2+Dg8rkuYV5yv2zJfdKe+2VL7pP23C+SJEnzR83gYSvgUcCjgbXA+cBfgc0t8/QBI9NpdPXqtYyMbJ58xnlmcHA5q1atmesy5h33y5bcJ+25X7bkPmmvF/dLf3/fgg7mJUlSd6sZPPwv8JXMXAUQEedRTqvY1DLPDsANFWuQJEmSJElzqGbw8EXg4xFxJ2ANcBBwDvDKiLgncA1wGOVik5IkSZIkqQtVu7hkZl4BvB34FnA1cB3wIeBI4Nxm2i8pYYQkSZIkSepCNUc8kJlnsOWIhkuBB9ZcryRJkiRJmh9q3k5TkiRJkiT1uKojHiR1r03AhuFp3ZTmdrnxpvUMzWB9Sxb3s6hCPZIkdbtOf9b7mSz1LoMHSTOyYXiESy6/dtbWNzCwhHXrNkx7uQMfuhvbLHZwlyRJ09Xpz3o/k6Xe5TtfkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqtprrAiRJkmYqIrYFvgMcDNwXeHPL03cBrsjMg8cscwTwVuDGZtIFmXnCLJQrSVJPMniQJEkLUkTsC5wK7A6QmRcCFzbP7QB8G3hJm0X3Al6amZ+apVIlSeppnmohSZIWqqOBY4Eb2jz3DuDDmfnrNs/tDRwREVdFxFkRsV3NIiVJ6nWOeJAkSQtSZh4FEBG3mR4R9wIeDRw1zqIrgZMpp2i8GXg/8KzprHvFimXTK3YSg4PLO9refOA2LQwTbdONN61nYGBJx9a1dOliBrffpmPttdNrr9FC1W3b1G3bA53fJoMHSZLUbY4BPpiZG9o9mZlPHv05It4O/Ha6K1i9ei0jI5tnXmGLwcHlrFq1piNtzRdu08Iw2TYNDY+wbl3bt9GMDA0NV92HvfgaLUTdtk3dtj0ws23q7++bMJT3VAtJktRtngR8ut0TEXHHiGi97kMfsHFWqpIkqUcZPEiSpK4REXcGts7Ma8aZZS3wiubClADHAefNSnGSJPUogwdJktRN7g5cP3ZiRJwWEU/MzE3A04APRcQvgIcAr5jlGiVJ6ile40GSJC1omblby8/fAx7aZp6jWn7+JvDgWSlOkiQ54kGSJEmSJNVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1W9VsPCIuA/4GGG4mPQ+4B/BqYDHw7sz8QM0aJEmSJEnS3KkWPEREH7A7sGtmbmym3QX4NPAQYAPwnYi4LDOvrlWHJEmSJEmaOzVHPETz/yURsQI4FVgDfDUzbwKIiHOApwInVaxDkiRJkiTNkZrBw3bApcALKadVfA34DLCyZZ6VwD7TaXTFimUdKm/2DQ4un+sS5iX3y5YWwj658ab1DAwsmdV1zmR9S5cuZnD7bSpUMz8shGNlLrhfJEmS5o9qwUNmfhf47ujjiDgdOAV4Y8tsfcDIdNpdvXotIyObO1LjbBocXM6qVWvmuox5x/2ypYWyT4aGR1i3bsOsrW9gYMmM1jc0NLwg9udMLJRjZbb14n7p7+9b0MG8JEnqbtXuahERj4iIx7ZM6gOuBXZsmbYDcEOtGiRJkiRJ0tyqearFnYCTIuJhlFMtjgCeDZwVEYPAOuApwDEVa5AkSZIkSXOo2oiHzPwicAHwY+CHwBmZ+W3gBOAy4Erg7Mz8Xq0aJEmSJEnS3Ko54oHMPBE4ccy0s4Gza65XkiRJkiTND9VGPEiSJEmSJBk8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFWz1VwXIEmSNFMRsS3wHeDgzLw2Ij4GPAJY18zy+sw8b8wyewKnAdsC3wCen5kbZ7FsSZJ6isGDJElakCJiX+BUYPeWyXsBj8rMlRMsehZwVGZeHhGnA0cDH6pXqSRJvc1TLSRJ0kJ1NHAscANARGwD7AKcERE/jYjXR8Rt+joRsSuwdWZe3kw6Ezh09kqWJKn3OOJBkiQtSJl5FEBEjE7aAfgq8ALgz8AXgX+hjIoYtRPQOhpiJbDzdNe9YsWy6Rc8gcHB5R1tbz5wm2bXmnU3s37D9M4YuvGm9bBo0bjPL9oMAwNLbm9pt1i6dDGD22/Tsfbamc+v0Uy5TfNft20PdH6bDB4kSVJXyMz/Bzx59HFEvA84nNsGD/3A5pbHfcDIdNe1evVaRkY2Tz7jFAwOLmfVqjUdaWu+cJtm3/rhES65/NppLTMwsIR16zaM+/z+++w64fPTNTQ0XHUfzvfXaCbcpvmv27YHZrZN/f19E4bynmohSZK6QkQ8ICKe0jKpDxgeM9v1wI4tj3egOVVDkiTVYfAgSZK6RR/w7ojYLiIWA8cAt7mjRWZeBwxFxMObSf8MXDS7ZUqS1FsMHiRJUlfIzJ8CbwG+DVwNXJmZnwKIiAsjYq9m1mcB74qIXwLLgPfORb2SJPUKr/EgSZIWtMzcreXnDwIfbDPP41t+/gmwz6wUJ0mSHPEgSZIkSZLqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqZqtaq8gIk4G7pyZR0bEnsBpwLbAN4DnZ+bG2jVIkiRJkqS5UXXEQ0Q8FjiiZdJZwHGZuTvQBxxdc/2SJEmSJGluVQseImJ74E3Am5vHuwJbZ+blzSxnAofWWr8kSZIkSZp7NUc8fAQ4Afhj83gnYGXL8yuBnSuuX5IkSZIkzbEq13iIiKOA32fmpRFxZDO5H9jcMlsfMDLdtlesWHb7C5wjg4PL57qEecn9sqWFsE9uvGk9AwNLZnWdM1nf0qWLGdx+mwrVzA8L4ViZC+4XSZKk+aPWxSWfDuwYEVcC2wPLKKHDji3z7ADcMN2GV69ey8jI5slnnGcGB5ezatWauS5j3nG/bGmh7JOh4RHWrdswa+sbGFgyo/UNDQ0viP05EwvlWJltvbhf+vv7FnQwL0mSuluVUy0y8+8z8/6ZuSfwGuD8zHwOMBQRD29m+2fgohrrlyRJkiRJ80PVu1q08SzgXRHxS8ooiPfO8volSZIkSdIsqnWqxS0y80zKHSzIzJ8A+9RepyRJkiRJmh+qBw+SJEm1RMS2wHeAgzPz2og4BngR5dpSPwCel5k3j1nmCOCtwI3NpAsy84RZLFuSpJ5i8CBJkhakiNgXOBXYvXm8O/By4CHAGsqIy2OBd41ZdC/gpZn5qVkrVpKkHjbb13iQJEnqlKMpwcLoXbI2AC/IzL9k5mbgKmCXNsvtDRwREVdFxFkRsd3slCtJUm9yxIMkSVqQMvMogIgYfXwdcF0zbRA4DjiyzaIrgZMpp2i8GXg/5QLYU9bp25cODi7vaHvzgds0u268aT0DA0umvdxEyyxa1D+jNsezdOliBrffpmPttTOfX6OZcpvmv27bHuj8Nhk8SJKkrhIRd6Hcsvv0zPza2Ocz88kt874d+O1017F69VpGRjbfnjJvMTi4nFWr1nSkrfnCbZp9Q8MjrGPWjFEAACAASURBVFu3YVrLDAwsmXCZTZum3+ZEhoaGq+7D+f4azYTbNP912/bAzLapv79vwlDeUy0kSVLXiIh7U0YyfDwz39Dm+TtGxEtaJvUBG2erPkmSepHBgyRJ6goRsRy4BHh1Zr5znNnWAq9oLkwJ5XSM82ajPkmSepWnWkiSpG5xFPC3wMsi4mXNtPMz8zURcVrz8/kR8TTgQxGxNfAr4PA5qleSpJ5g8CBJkha0zNyt+fFdbHnrzNF5jmr5+ZvAg+tXJkmSwFMtJEmSJElSRQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1UwpeIiIF0bEtrWLkSRJvcm+hiRJ3WuqIx72AH4VEadFxF41C5IkST3JvoYkSV1qSsFDZh4N3Av4AfDBiPh+RDw3IpZWrU6SJPUE+xqSJHWvKV/jITPXAJ8FzgZWAMcCGRFPqFSbJEnqIfY1JEnqTltNZaaIeCxwDHAApUPwpMz8aUTcA/gm8IV6JUqSpG5nX0O9ZBOwYXiko21u7mhrktRZUwoegA8AHwSOycw/j07MzN9GxKlVKpMkSb3EvoZ6xobhES65/NqOtrn/Prt2tD1J6qTpXFxydWb+OSJ2iIh/i4h+gMx8bb3yJElSj7CvIUlSl5pq8PB+4ODm5xHgkcC7q1QkSZJ6kX0NSZK61FSDh4dl5jMBMvMPwKHAY6pVJUmSeo19DUmSutRUg4fFEXGHlsdTvTaEJEnSVNjXkCSpS031Q/0C4OKI+ATlormHNdMkSZI6wb6GJEldaqrBw8sp99I+BNgIfA74SK2iJElSz7GvIUlSl5pS8JCZm4D3Nv8kSZI6yr6GJEnda0rBQ0Q8iXJl6e2AvtHpmbltpbokSVIPsa8hSVL3muqpFm8DXgr8iHLepSRJUifZ15AkqUtNNXj4U2Z+rmolkiSpl9nXkCSpS031dppXRMRBVSuRJEm9zL6GJEldaqojHh4PHBcRNwM3U8693Ox5l5IkqUPsa0iS1KWmGjw8tmoVkiSp19nXkCSpS03pVIvMvA7YGzgaWAU8rJkmSZJ0u9nXkCSpe00peIiIVwL/CjwN2Bp4bUScWLMwSZLUO+xrSJLUvaZ6cclnUM69XJeZq4GHAodVq0qSJPUa+xqSJHWpqQYPw5m5YfRBZv4JGK5TkiRJ6kEz6mtExLYR8bOI2K15fEBE/DQifh0RbxxnmV0i4hsR8cuI+O+IWNapjZAkSVuaavDw+4j4R2BzRCyJiBMAz7uUJEmdMu2+RkTsC3wL2L15vDVwBnAIcB9g73Fu0flB4IOZeW/gB4CndEiSVNFUg4fjgJcCewDrgIOaaZIkSZ0wk77G0cCxwA3N432AX2fmNZm5ETgLOLR1gYhYDDwKOKeZdObYeSRJUmdN6XaamXkD8NiI2AZYlJlr6pYlSZJ6yUz6Gpl5FEBEjE7aCVjZMstKYOcxi90Z+EsTTIw3z6RWrOjs2RmDg8s72t584DaN78ab1jMwsKQjbY1atKh/Rm1OtMxM2xzP0qWLGdx+m461147H3cLQbdvUbdsDnd+mKQUPEfHSMY8ByMxTOlqNJEnqSR3qa/QDm1se9wEjk8xDm3kmtXr1WkZGxjYzM4ODy1m1qru+03GbJjY0PMK6dRsmn3EaNm2afpsDA0smXGYmbU5kaGi46nHhcbcwdNs2ddv2wMy2qb+/b8JQfkrBA/CAlp/vAOwHXDqtSiRJksbXib7G9cCOLY934NbTMEb9AbhjRCzKzE3N/GPnkSRJHTTVUy2e0/o4InYCTq9SkSRJ6jkd6mtcURaNewLXUG7HecaY9QxHxDeBpwNnA4cDF820bkmSNLmpXlzyNprzMHfrbCmSJEnFTPoamTkEHAmcC1wN/JLmIpIRcVpEPLGZ9QXAMRFxNfBI4NWdqVqSJLUzk2s89AF7UYYqTrbcScBTKedSnp6Zp0TEAcApwNbAZzLTD3tJknrcTPsaAJm5W8vPlwIPbDPPUS0/Xwc8eoalSpKkaZrJNR42A78DXj7RAhGxH7A/5bZYi4GrI+JSypDH/YDfAxdExEGZ6RBHSZJ627T7GpIkaWGY0TUeprjM1yPiMZm5MSLu0qzrTjT31waIiNH7axs8SJLUw2bS15AkSQvDVE+1uIwtbz11i8zcf5zpwxHxeuB44LNM7f7aE+r0fbNnUzfe37UT3C9bWgj7pMY9yCczk/XNxj3D59JCOFbmgvtl4ZlpX0OSJM1/Uz3V4gfAfYGPAjdTrgC9FfDpyRbMzNdGxNuALwC7M/n9tSfUyftmz6ZuvL9rJ7hftrRQ9kmNe5BPZLJ7jY+n9j3D59JCOVZmWy/ul8nunb1AzLivIUmS5repBg+PAB7R3O+aiLgYuDwzzx1vgYi4N7A0M6/MzPUR8TnKhSY3tczW7v7akiSp90y7ryFJkhaGqQYPg8BSYF3zeDkw2djluwOvj4hHUEY5HAJ8BHjHRPfXliRJPWkmfQ1JkrQATDV4OBu4vBm10Ac8DXjPRAtk5oURsQ/wY8ooh3Mz89MRsYpyf+2lwIU099eWJEk9bdp9DUmStDBM9a4Wr4mIH1Nuj/lX4HmZ+fUpLPc64HVjprW9v7YkSepdM+1rSJKk+a9/GvP+D/Az4ETKRZ8kSZI6yb6GJEldaErBQ0Q8B/gY8ArgjsB/R8TRNQuTJEm9w76GJEnda6ojHl4I/B3wl8z8A/AQ4N+qVSVJknqNfQ1JkrrUVIOHTZn5l9EHmfl7YGOdkiRJUg+yryFJUpeaavBwU0TsSbktJhHxLOCmalVJkqReY19DkqQuNdXbab6YctvLe0TESsrVpg+pVpUkSeo19jUkSepSUw0etqHcAnN3YBGQmTlcrSpJktRr7GtIktSlpho8fDIz7wP8omYxkiSpZ9nXkCSpS001ePhpRBwGfAtYOzoxMz33UpIkdYJ9DUmSutRUg4dDgEPHTNtMGQopSZJ0e9nXkCSpS00peMjMpbULkSRJvcu+hiRJ3WvC22lGxEdbfr5z/XIkSVIvsa8hSVL3mzB4APZq+fmSmoVIkqSeZF9DkqQuN1nw0DfOz5IkSZ1gX0OSpC43WfDQanO1KiRJkuxrSJLUlSa7uGR/RGxH+QZiUcvPgLe4kiRJt5t9DUmSutxkwcMDgP/j1g7A6pbnvMWVJEm6vexrSJLU5SYMHjJzOqdiSJIkTYt9DUmSup8f9pIkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFWz1VwXIEmS1EkRcRRwXMukuwGfyMzjWuZ5LfBc4I/NpFMz8wOzV6UkSb3D4EGSJHWVzDwNOA0gIu4HfB543ZjZ9gKekZnfnd3qJEnqPQYPkiSpm30IeFVm/t+Y6XsBr4qIXYFvAMdn5tCsVydJUg8weJAkSV0pIg4Ats7Mz46Zvgz4MfBy4DfAmcCJwAlTbXvFimWdKxQYHFze0fbmA7dpfDfetJ6BgSUdaWvUokX9M2pzomVm2uZ4li5dzOD223SsvXY87haGbtumbtse6Pw2GTxIkqRu9TzglLETM3Mt8PjRxxHxTuAMphE8rF69lpGRzZ2okcHB5axataYjbc0XbtPEhoZHWLduQ0faGrVp0/TbHBhYMuEyM2lzIkNDw1WPC4+7haHbtqnbtgdmtk39/X0ThvLe1UKSJHWdiLgDsB9wfpvndomI57ZM6gOGZ6s2SZJ6jSMeJElSN9oD+FVmrmvz3F+Bt0fEZcC1wLHAebNYmyRJPcURD5IkqRvdHbi+dUJEXBgRe2XmKsppGF8AkjLi4Z2zX6IkSb3BEQ+SJKnrZOZ/Af81ZtrjW34+Fzh3tuuSJKkXOeJBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqZqtajYeEa8FntY8vCAzXxERBwCnAFsDn8nMV9esQZIkSZIkzZ1qIx6agOFA4EHAnsBDIuKZwBnAIcB9gL0j4qBaNUiSJEmSpLlV81SLlcDLMvPmzBwGfgHsDvw6M6/JzI3AWcChFWuQJEmSJElzqNqpFpn589GfI+JelFMu3kcJJEatBHaeTrsrVizrSH1zYXBw+VyXMC+5X7a0EPbJjTetZ2BgyayucybrW7p0MYPbb1OhmvlhIRwrc8H9IkmSNH9UvcYDQETcD7gAeDmwkTLqYVQfMDKd9lavXsvIyObOFThLBgeXs2rVmrkuY95xv2xpoeyToeER1q3bMGvrGxhYMqP1DQ0NL4j9ORML5ViZbb24X/r7+xZ0MC9Jkrpb1btaRMTDgUuBV2bmx4HrgR1bZtkBuKFmDZIkSZIkae5UG/EQEXcFPg88PTO/2ky+ojwV9wSuAQ6jXGxSkiRJkiR1oZqnWhwPLAVOiYjRaR8GjgTObZ67EDinYg2SJEmSJGkO1by45IuBF4/z9ANrrVeSJEmSJM0fVa/xIEmSJEmSepvBgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVY/AgSZIkSZKqMXiQJEmSJEnVGDxIkiRJkqRqDB4kSZIkSVI1Bg+SJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGq2musCJEmSOi0iLgP+BhhuJj0vM69oef4A4BRga+Azmfnq2a9SkqTeYPAgSZK6SkT0AbsDu2bmxjbPbw2cAewH/B64ICIOysyLZrdSSZJ6g6daSJKkbhPN/5dExE8i4rgxz+8D/Dozr2mCibOAQ2e1QkmSeogjHiRJUrfZDrgUeCGwGPhaRGRmfrl5fidgZcv8K4Gdp7OCFSuWdaLOWwwOLu9oe/NBt2zTmnU3s37DRm68aT0sWtSRNhdthoGBJR1p65Y2F/XPqM2Jlplpm+O5wx22YqS/r2PtAWyzZCuWD9zhlsfdcty1cpvmv27bHuj8Nhk8SJKkrpKZ3wW+O/o4Ik4HHg+MBg/9wOaWRfqAkemsY/XqtYyMbJ58xikYHFzOqlVrOtLWfNFN27R+eIRLLr+WgYElrFu3oSNt7r/Prh1ra9SmTSPTbnOybZpJmxNZPzTMV793XcfaAzjwobsxtL7U2E3H3Si3af7rtu2BmW1Tf3/fhKG8p1pIkqSuEhGPiIjHtkzq49aLTAJcD+zY8ngH4IbZqE2SpF7kiAdJktRt7gScFBEPo5xqcQTw/JbnrwAiIu4JXAMcRrnYpCRJqsARD5Ikqatk5heBC4AfAz8EzsjM70bElRGxU2YOAUcC5wJXA78EzpmreiVJ6naOeJAkSV0nM08EThwzbc+Wny8FHjjbdUmS1Isc8SBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGoMHSZIkSZJUjcGDJEmSJEmqxuBBkiRJkiRVs9VcFyBJkiR1yiZgw/BIx9rb3LGWJKl3GTxIkiSpa2wYHuGSy6/tWHv777Nrx9qSpF7lqRaSJEmSJKkagwdJkiRJklSNwYMkSZIkSarG4EGSJEmSJFVj8CBJkiRJkqoxeJAkSZIkSdUYPEiSJEmSpGoMHiRJkiRJUjUGD5IkSZIkqRqDB0mSJEmSVI3BgyRJkiRJqsbgQZIkSZIkVbNV7RVExLbAd4CDM/PaiDgAOAXYGvhMZr66dg2SJEmSJGluVB3xEBH7At8Cdm8ebw2cARwC3AfYOyIOqlmDJEmSJEmaO7VPtTgaOBa4oXm8D/DrzLwmMzcCZwGHVq5BkiRJkiTNkaqnWmTmUQARMTppJ2BlyywrgZ2n0+aKFcs6UttcGBxcPtclzEvuly0thH1y403rGRhYMqvrnMn6li5dzOD221SoZn5YCMfKXHC/SJIkzR/Vr/EwRj+wueVxHzAynQZWr17LyMjmyWecZwYHl7Nq1Zq5LmPecb9saaHsk6HhEdat2zBr6xsYWDKj9Q0NDS+I/TkTC+VYmW29uF/6+/sWdDAvSZK622zf1eJ6YMeWxztw62kYkiRJkiSpy8z2iIcrgIiIewLXAIdRLjYpSZIkSZK60KyOeMjMIeBI4FzgauCXwDmzWYMkSZIkSZo9szLiITN3a/n5UuCBs7FeSZIkSd2rv7+P9cPlknE33rSeoeFpXT6urSWL+1l0u1uR1Gq2T7WQJEmSpI64eeMIX/3edcDML0Q91oEP3Y1tFs/2pfCk7uY7SpIkSZIkVWPwIEmSJEmSqjF4kCRJkiRJ1Rg8SJIkSZKkagweJEmSJElSNQYPkiRJkiSpGm+nKUmSukpEvBZ4WvPwgsx8RZvnnwv8sZl0amZ+YBZLlCSppxg8SJKkrhERBwAHAg8CNgNfiognZ+Z5LbPtBTwjM787FzVKktRrDB4kSVI3WQm8LDNvBoiIXwC7jJlnL+BVEbEr8A3g+Mwcmt0yJUnqHQYPkiSpa2Tmz0d/joh7UU65eHjLtGXAj4GXA78BzgROBE6YznpWrFjWgWpvNTi4vKPtzQdztU033rSegYElHWtv0aL+W9rrVLutbXbKTNucaJlO1zkb292J9pcuXczg9tvc7nY6xd8P81+3bQ90fpsMHiRJUteJiPsBFwAvz8xfj07PzLXA41vmeydwBtMMHlavXsvIyOaO1Do4uJxVq9Z0pK35Yi63aWh4hHXrNnSsvU2bSnsDA0s61u5om500kzYn26ZO11l7uzv1Gg0NDc+b96S/H+a/btsemNk29ff3TRjKe1cLSZLUVSLi4cClwCsz8+NjntslIp7bMqkPGJ7N+iRJ6jWOeJAkSV0jIu4KfB54emZ+tc0sfwXeHhGXAdcCxwLntZlPkiR1iMGDJEnqJscDS4FTImJ02oeBJwKvycwfRMTzgC8AdwC+BbxzLgqVJKlXGDxIkqSukZkvBl7c5qkPt8xzLnDurBUlSVKP8xoPkiRJkiSpGoMHSZIkSZJUjcGD9P/bu9MoS8rygOP/7qZnZIaAGlEWZUiCPCxRBwdH2UQG4oeAjgvICYgIAqIQQ9yiR0xQ0RA5CbggR0UddDQuJLjEEUcyGhQc0eg4yvIIRlDQYxSICsOs3flQ1cOl6fXeW7equ/+/c+bMrbpVdZ/3vn3feuup5ZUkSZIkVcbEgyRJkiRJqoyJB0mSJEmSVBlHtZAkSZKkCm0DNm0ZmvZ6v753AxvHWG/eYD+b29jeROYP9jPQ1S1KDzHxIGlW6+/vY0OXd8zd5o5ekqTZbdOWIVavvWPa6y1cOJ8HHtj0iPnLli5izY13diGyhzz3WXuzYNAL4lUNEw+SZrXNW4e6vmPuNnf0kiRJms3s6UqSJEmSpMqYeJAkSZIkSZUx8SBJkiRJkipj4kGSJEmSJFXGxIMkSZIkSaqMiQdJkiRJklQZEw+SJEmSJKkyJh4kSZIkSVJlTDxIkiRJkqTKmHiQJEmSJEmVMfEgSZIkSZIqY+JBkiRJkiRVxsSDJEmSJEmqjIkHSZIkSZJUGRMPkiRJkiSpMiYeJEmSJElSZUw8SJIkSZKkyph4kCRJkiRJlTHxIEmSJEmSKrND3QFIeqRtwKYtQ3WHMaHhugOQpFmk2+3+/MF+Brq2tcJ0Yvz1vRvYOIVlq4hT6lR/fx8butwPm6v9prHajam2D+OZN9jP5oa3l1Wo4vigl2U38SA10KYtQ6xee0fdYUxo2dJFdYcgSbNGt9v95z5rbxYMdvfC1unEuHDhfB54YNOky1URp9SpzVuHWHPjnV3d5lztN43Vbky1fRjPsqWLulo/M6UdquL4oJdlb/43LEmSJEmSZiwTD5IkSZIkqTImHiRJkiRJUmVMPEiSJEmSpMqYeJAkSZIkSZVxVAtJ0qyxjc6H6eqFmTJ0lyRJUjeYeJAkzRqbtgxx/ffv7miYrl6YKUN3SZIkdYO9HkmSJEmSVBkTD5IkSZIkqTImHiRJkiRJUmVMPEiSJEmSpMqYeJAkSZIkSZWZ86NabKN4CnrVOhnezWHXpNmtv7+PDW20D70eNnLeYD+bGz5M5XDdAUiSJOkR5nziYdOWIVavvaPyz1m4cH7bw7s57Jo0u23eOsSaG++c9nqdtCvtWLZ0UVtx9tKypYvqDkGSJEmjeDQrSZIkSZIqY+JBkiRJkiRVppZbLSLiJOB8YBC4NDMvqyMOSZI0O03W14iIxcAVwM7AdcDZmbm154FKkjQH9PyKh4jYE3gncDiwGDgrIg7odRySJGl2mmJfYyVwbmbuC/QBZ/Y2SkmS5o46rng4BliTmfcCRMRVwPHA2ydZbwCKp79308BAHzstGOzqNsey4FGD9A239zT4gYG+rpe7SWZz2drVq7/LTuww0N/TGNv9DfU6zna0G2Mn7Uo7Zsp3ubDH30s7ut2ut2zLQZAKE/Y1ImIRsGNmri2XXwG8Dbh8CtueEf2RKvoO04lxqu1T3XFOxUjb1802t4r2tJ1tTlambsdZdbm7VUdNqR8Yv0xVxNir32On9dTtsnej3L04nqni+GCisk+3TJP1RfqGh3s7+FhEvBlYmJnnl9NnAEsz86xJVj0c+GbV8UmSNIMdAXyr7iDqNllfIyIOAS7OzMPL6X2AVeXVD5OxPyJJ0vjG7IvUccVDPw8far0PmErK67sUhfgVsK2CuCRJmqkGgN0p9pWavK/Rbl8E7I9IkjSWCfsidSQe7qLYYY/YDfjlFNbbhGdxJEkaz0/rDqBBJutr3EXRORrv/YnYH5EkaWzj9kXqGE7zWuDoiNg1IhYALwauqSEOSZI0O03Y18jMO4GNEXFYOesU4Cu9D1OSpLmh54mHzLwbeAvwdWAd8KnMvLHXcUiSpNlpvL5GRKyKiIPLxU4GLomIW4GdgPfWE60kSbNfzx8uKUmSJEmS5o46brWQJEmSJElzhIkHSZIkSZJUGRMPkiRJkiSpMiYeJEmSJElSZUw8SJIkSZKkyuxQdwBzRUQcAVwKzAN+BpyamffVG1X9yjHUL6H4Xu4BTi/HVxcQEe8AtmXmBXXHUpeIOAk4HxgELs3My2oOqTEiYmfgBuC4zLyj5nBqFxH/ALyknPxyZr6xznikVhGxF7ASeDyQwMmZef84y/4F8KbMPLqcHqTYR/5Py2JLMnNbtVFPrMMy9QEXA8cBQ8CZmXl9TwKfwFTKFBHzgI8ABwMPAidl5q1NqqfJ9p0RsRi4AtgZuA44OzO3TqdOe62DMp0KXAT8ulz0y5n5lt5FPr6p9nEi4uPAmsxcUU43sp46KM+MraOIWA68DeijOMY7LTPva2odQUdl6qievOKhdz4GnJKZTwFuBt5QczxN8UngjMxcXL52HHUgInaJiI8Ar6s7ljpFxJ7AO4HDgcXAWRFxQL1RNUNEPBP4FrBv3bE0QUQcAzwXOIjib2VJRLyw3qikh/kA8IHM3A/4HvDW0QtERH9EvA74NDDQ8tZTgW9n5uKWf7UmHUqdlOnFwP7AAcALgBUR0YQTYpOWCXgN8EBm7g+cB6wo5zeinqa471wJnJuZ+1IcXJxZzp9K+XuuwzIdDLy2pU6ackA7aZkiYo+I+BJw/KjVG1dPHZZnRtZReQLocuDYzHwasB64oHy7cXUEHZepo3oy8dA7+2fmzWU2fE/Aqx0i5gPnZ+b6ctZ6YK8aQ2qS5cBtwD/XHUjNjqHIiN+bmQ8AV/HIndVcdSZwDvDLugNpiF8Br8vMzZm5BbgF2xM1RLnvfzZFGwbFgeoJYyy6f/nvzFHznwHsGhHfi4i1EXFkVbFOVRfKdCzw6cwcysyfAD8HDq0m2qmZRpmOpThZQmZeR1E3e9Gceppw3xkRi4AdM3NtOWsFcMI0yl+HtspUvn4GcGpE/CgiVkbEY3oY90Sm0sc5GfgC8NmRGQ2up7bKU5qpdTQInJOZd5fT64G9GlxH0GaZytcd1ZOJhx7JzC0R8RTgLuAoisz/nJaZmzJzJRRnRCiyaZ+vNaiGyMyPZ+ZFQBPOaNVpD4oDyhG/Ap5YUyyNkplnZOY3646jKTLzppEOZ0Q8meKWi1X1RiVt9zjg95m5tZwesy0r/47PAO4d9dYwxf7xEOBVwGci4nEVxjsVnZapie37lMrE+LE3pZ4m+27He3+q5a9Du2Uaef0OiitSfgG8v7owp2XS30BmXpyZV4xar6n11G55RpadcXWUmfdk5tUAEbEj8CaKNqCpdQTtl2lk2bbrqQmXtM0qEXECxTMLWt2amcdk5o+AJ0TEK4HPUHNmv5cm+l7KeyWvpPh7fFfPg6vRRN9LHfE0UD9FR25EH8W9wNKYIuJA4MvAGzLztrrj0dwzTrt+Gw9vsDMu3gAACHVJREFUy2AabVlmfrBl8gcR8R3gMIozh5WrokzU3L53WKYxY6+7nlpM9t2O9/7o+dCcfW67ZSIzt992FxHvBn5aXZjT0u5voKn11PZveqbXUUTsAlwN/DAzryxvZ2hiHUGbZYLO68nEQ5dl5ueAz7XOi4hHRcQLMnMkW7SSOXYJ/VjfC0BE7AR8keJhTMvLS6TnjPG+F213F3BEy/RueGuBxlE+rPbfgPMyc85fVaZ6jNMPGATuiYiB8p7/3ZlGWxYRpwA3ZOZIJ68P6Nn+sooyUbTvu7dM97R977BMI7GP1MduwC/rrqdR8U207xzvu/9fYJcO6rRKbZWpPHg6PTNHkkx9wFaaod0+TlPrqa3yzPQ6iojdga8Ca4C/LWc3tY6gzTJ1o5681aI3tgCXRcSScvolFA+FU5GEuR04MTM31R2MGuda4OiI2DUiFlA8jOyammNSA0XEkyguBTzJpIOapkyqfxM4sZz1MuAr09jE0ygfNhwRQfEQ1VpvtepCmVYBJ0fEQETsQ/Gg3O92N8rpmUaZVpXvERGHAxsz8+c0p54m3HdmMXrYxjJZC3AK8JUu1GmV2ioTcD/wxvKBzADnUpzFbYK2+jgNrqd2+2wzto4iYgD4EvDZzDwvM4eh0XUEbZaJLtSTiYceKDNdJwIfioh1FA/wOKPeqOoXEQdRPETxMOD7EbEuIrwnW9uVD7Z5C/B1YB3wqcy8sd6o1FCvBx4F/EvZlqyLiLPrDkpq8WqKp4ffTHG26XyAiDg7It4+ybpvBx4fET+meBDYyzLzD5VGOzWdlOkq4CaKB5d9AXhFZj5YZbBTNJUyvQ+YHxE3UYzGdUo5vxH1NN6+MyJWRcTB5WInA5dExK3ATjw0qtiY5a9bu2Uq++AvAS6PiFuAJUAjhlqeYpnG07h6arc8M7yOng88HTi+pe8x8gyLxtURtF+mbtRT3/Dw6NtPJEmSJEmSusMrHiRJkiRJUmVMPEiSJEmSpMqYeJAkSZIkSZUx8SBJkiRJkiqzQ90BSJI0V0TEzsANwHGZeccUlv84sCYzV5TThwGXAPOAeyjG1L6zsoAlSZK6wMSD1CARsTfwU+BHLbP7gPdk5kc73PZ/AFdl5opyWNfnZOb/jbPsLsDVmbmsk89sI8aXA8dn5nG9/FypF8qxrz8M7DuFZfcAPggcDaxpeeuTwPMzc31EnE4xBN7yCsKVNIfZH7E/InWbiQepeR7MzMUjExGxJ/DjiPheZq7vxge0bn8cjwGWduOzJG13JnAO8ImRGRHxMuA8ilsf/xs4JzM3UoxH/wWKqxpGlp0PnN/SDqwH/ro3oUuag+yPSOoaEw9Sw2Xm3RFxG7BvRDwdeAWwEPhdZh4VEa8AXk1x4HIPcG5m3lqeMb0S2AO4E3j8yDYjYhjYNTN/GxFvBk4FtgK3AS8HPgbsWJ6JWAIcClwMLAA2Uxz8XFOeEXhYPC2fcRbwvMx8Xjm9H/CfwF7l572S4nLxxwIXZeblreWOiG8A78/Mq0ZPR8T+wHuAPwYGgPdm5kcjYqcy9icDQxQHcq/MzKG2vnypizLzDICIoPz/QIpkxKGZuTEi/hF4PXBhZl5cLnN4y/qbgJXl/H7gAuDzPSyCpDnM/oj9EakTPlxSariIOATYB/hOOetAissSj4qIIyl2mkdk5kHAu4Gry+UuA9Zm5oHAa4D9xtj28yl27Idk5p8DPwPOBU7joTMdjwauAv4mM59aft7KiPiT0fGM2vy/AodHxG7l9GmUHQiKg62/LGM+sYx7qt/HDmU8b8rMJcCRwOsj4lnAC4E/KuN+RrnKn05121KPHUXRKV1bdqqXM8bvdLSImEdxy8UOwLsqjVCSSvZHHhGz/RFpGrziQWqekcw+FL/R3wInZ+YvyjOl6zPz9+X7x1J0Am4YOYsKPCYiHgscQ3H2lMy8PSJa7xMfcQzwucy8r1zutbD93s4RzwRuz8zvlMvcFBHXA88BhkfFs11m/iEi/h14aURcQnHp+BGZeX9EHAccGxFPBhYDO03j+9kX+DPgoy1l3hE4CLgGeFd5NuJrwKWZefs0ti310gDw2cx8DUB5hmzC/XK5zBcpziYuz8wtlUcpaa6yPzIx+yPSNJh4kJrnYfdUjuH+ltcDwCcy8+9g++XXewD3UeyE+1qW3TrGtraWy1Gu/2iKMwqtBlqXKfUDgxSXOd7P+D4MfAi4BbglM38WEU8Evl3O/xbF2YKxHt40Ov55LfH8btR9p08o522MiH0oOiHLgGsj4qzM/NIEMUp1+QbF2bELgd8Al1M8zO2CCdZZCdwOnO0lu5IqZn+kYH9E6gJvtZBmtq8CfxURu5fTZ1PctwhFtv0sgIjYi+Ky7tGuBV5UDvEHxQHPayk6AAMR0UexU94vIpaW2zoQeDbFQdOEMnMtxc767yl2+gAHUxxkXQisptzJR8TAqNV/Uy5LRBwAPHVks8CDEfHS8r0nAT8GlkTEqygun1xddn6+Cjx9sjilOmTmD4G3UYxacRNFJ/ai8ZaPiIMobsc4DPh+RKyLiFW9iFWSJmF/xP6INCGveJBmsMxcHRH/BHwtIoaA3wMvyszhiDgH+FhE3ALcBawbY/1V5U70+vIywZso7nfcANxYTh8BnAC8LyIWUDwk6bTM/ElEHDqFMD8MvJWHHoK3GjidYoc9BPwXxU59n1HrXQhcGRHHArcC15Uxb46I5cB7IuKNFGc63pqZ15eXhD4HuDkiNgA/pxhuUGqMzNy75fUVwBUTLPvyltc/4OFn3SSpEeyP2B+RJtM3PDz6iiVJkiRJkqTu8FYLSZIkSZJUGRMPkiRJkiSpMiYeJEmSJElSZUw8SJIkSZKkyph4kCRJkiRJlTHxIEmSJEmSKmPiQZIkSZIkVeb/AZ1THFxdL2gxAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.linear_model import Ridge\n", "\n", "# some values you can try out: 0.01, 0.1, 0.5, 1, 5, 10, 20, 40, 100, 200, 500, 1000, 10000\n", "alpha = 100\n", "ridge_model = Ridge(alpha=alpha).fit(X_train_df_N, y_train)\n", "\n", "print('R squared score for our original OLS model: {}'.format(r2_val[-1]))\n", "print('R squared score for Ridge with alpha={}: {}'.format(alpha, ridge_model.score(X_val_df_N,y_val)))\n", "\n", "fig, ax = plt.subplots(figsize=(18,8), ncols=2)\n", "ax = ax.ravel()\n", "ax[0].hist(model_N.params, bins=10, alpha=0.5)\n", "ax[0].set_title('Histogram of predictor values for Original model with N: {}'.format(N))\n", "ax[0].set_xlabel('Predictor values')\n", "ax[0].set_ylabel('Frequency')\n", "\n", "ax[1].hist(ridge_model.coef_.flatten(), bins=20, alpha=0.5)\n", "ax[1].set_title('Histogram of predictor values for Ridge Model with alpha: {}'.format(alpha))\n", "ax[1].set_xlabel('Predictor values')\n", "ax[1].set_ylabel('Frequency');" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R squared score for our original OLS model: -1.8608470610311345\n", "R squared score for Lasso with alpha=0.01: 0.5975930359800542\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABB4AAAHwCAYAAAAB0KxmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzde5wlZ10n/s9MMswkk0HI0G6CEVjFfAEVwi2wXAS5aRQMKogGVoIK6griuuC6EgQR1lX5AaKoLBBgZaP8DOAiNyMBdLnLJcr1EfkFFsgYZyeoyYRJJun5/VE1pNPT3XO6Z54+fXm/X6+80udMnarvearqnOd86qmqLYcOHQoAAABAD1unXQAAAACwcQkeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8LDBVdWhqrrNvOfOr6q3jH8/r6p+/Cjz+NWqOrdnnb1U1VlV9fmq+mhV3WEVlveWqjp//PuyqrrVEtN+Q1W9q3dNCyz36+u/83IeUVVfrKoPV9VJxzCfnVX1wqr6+6r6RFX9XVW9YLF5VtVtq+r9E8z3bVV1l2Oo64h9q4eqekZVvaZXPVX1A1X10vHv76+q541/T7SdVNWDq2q2qh4+7/nfq6rnTvD6u1TVe8f95eNV9T3LfQ/A6tGv0K9YYLnrpl+xWrVOoqruMO5Pf7XAv71mJd/rc7eXJaZ5cFV9cpnlHn7t1/fvufvxWO8zVjLPOfM+6vutqvdU1WOOZTnz5ndyVV1UVZ+pqlZVj17pdGPb/N7xqm0jOnHaBTBdrbVfnWCyhyT5dO9aOvmBJO9urf3Uai+4tXbWUSa5dZKzV6OWKfnRJK9orT1/pTOoqhOTvDPJB5Kc1Vq7tqpOTvIbSf6iqh7SWrth7mtaa1ckud/R5t1a+76V1rWRtNbenOTN48N7Jzl1BbO5Pslrq+qurbX/u8zX/n6SC1trF1bV3ZO8p6p2z1+vwPqgX9GPfsWx9yvWoANJqqpu31r7YoYHO5Pcf7plLWze/r2e9+PDnpvkmtbanavqdkk+UFUfaa19edLpquqMJC9Jck6SV69m8euN4GGTG4+kfrK19sKq+rUkP5jhR8S+JOcn+aEk90ry21V1Y5J3JXlZkrOSHEry9iS/0lq7oaq+L8lvJrkxyWVJHpbkAUkenOQnk+xM8i9JHpnkD5J8W5LdSa5Ocl5rrVXVe5J8NMl9k3xjkv+e5LQkDxpf/yOttU8s8D6eneTHktyQ5O+TPDXJQ5P8hyQnVNVJrbXHz3vNDUn+W4YPip3j+3jjmBR/vd7W2ndX1U+O89o6ts1TW2ufrarbJnltktsm+eJY8+H5H0oy01r7v1X1X5I8cazvc2PbvjrJSVV1WZJ7Zvix/NtJTh7XwQWttXcsVM+cZTwlyaNaa48aH98pyaVJbjcu76eT3CLDj8n/1lr7g3lt8J4kv9dau3j+46q6c5LfGdfRCUleOv44PGWs/duSzI7r66dba7Nz5vvMJI9O8rWq+oYkv5LkReM6uTHJh5L8x9ba1VX1hfHxXcd18KY5JT42ydbW2i8efmIMH34hyceT/GBV/U2S/53kM0nuML7vv2ytnTKGFH+YYXv654xfkK2188flPibJKUlekOT/S/IdSbaN7+d9VXVmhu19V5LTM2zXj2utHcgiqurA+F4fNs77ueP7+M4kV4zra39VPTALr+9tSV6a5OFJ/inJlRn2m4xt+TvjvLZlWNfPXOxHelW9JMnVrbVnV9Xp4/If0lp7d1U9IcmjMuzDj0ny60l+JsP+8i8ZttPTq+qtGbanGzLsp59ZYFH/kGEdvnqc5/w6fiDJzywS9pyQobOcDO28aNsCa59+hX5F1na/YlFV9chxvrcY2/214/fngvWN7bpg3WM7/vxY25UZ1u/fL7DYG5O8Psnjk/zX8bkfSvK/kvynObUtOL+jbC8LtvcS7/+yJP+ptXZpVf3Y+N5u3Vr7WlW9cnx/90nyySRfy8334yS5Xw0jTv/NOM15rbX985Zx1H7VuH0+NsO+cfskX0nyxPHAUpKcO24Pp2U4OPXksc1/Jcm5SU7KsG0/o7X2prGN3pbk++bM47AfTHJekrTW/k9V/WWSH8mwbU063U8meU+GPuZKDt5sGk612BzeXcPwvMvGD5XnzZ+gqr45yS8kuXdr7V5JLklyn9bay5J8JMOPmzdl+EG0L8MPn3sluVuSZ1TV7iR/lOQJYyL/7iTfNGcR357kweOX2zlJ/rm19u9aa2cm+ZsMX+iH3aG1dv8kT0jyW0neM9b0jiRPW6D2J43zvHdr7a4ZPuxe01r7nxl+dL5+fudgdEKSa1tr98zw4XFhVc3Mr7eqHpThy/aBrbW7jzUd/hJ7WZIPtta+PcMXwp0WqO8HMnQI/l1r7TuSXD6+3ycl+drYXrdKcnGSp4/v4YlJXldV/3aB9pvrj5M8oKpOGx8/KWPHI8mTM3zI3j3J48a6JzKONLg4yS+P7fOgDOv5vhk+fHeNdd97fMm3zH19a+23MxxFf3Fr7ZlJLsjwpXi38b+tGTpDh32ytXbnBToH90vy1/Pra60dytAResD41BlJfn3cnvbMmfTZGQLWO2XosN59kbd8nyT/z9hWr85NX/5PztDxuG+SOyb5t0m+f5F5HLY9yT+21s7O0Bl4ZYZ96y5JviHDF+buLL6+/0OSM8fpH56hs3fYi5N8dFwnd09ymyS/mMW9McO+kSTfm+Qfx3kmw1G7NxyesLX2ody0vzxrfPpbxhq/M8N6WGoY5c8nObOqnjr/H1prb15ihMnPJfkvVfXlDB2InzXaAdY8/Qr9ivXar1isvi0Zfug/cdw27pvhu+k2S9S34PNV9ZAkv5Tku1trd0tyUZI/G5exkP+R5N/PefzEJK+ZU9tS81twezlKey9mfp/hq0keOC7n+3LTNpoF9uNk2D8flqEPc0aGAGW+SftVD8qw7d4lQ+Dx0jn/titD//DOY733r6rbj8t+8Li9Pyvj51Jr7YrW2lkLhA5J8s1JvjTn8ZfH2ieerrX2a62138sQPrEEwcPm8N3jDnfW+OG40DDIryT52yQfq6oXJrmstfZnC0x3Tobk+lBr7boMX8DnJPmuJJ9urf1tkrTWXpvkX+e87u9aa/86/tvFSV5TVU+rqt/JcOTilDnTvnH8/+fH/79jzuOFksRzkrx6Tqr6O0keWlW3WKgx5vm9saa/S/KJ8X3crN4MH4h3TPL+sYP1W0luXVWnZviQe804j3/IcORmvocl+dPW2lfH6X6xtfaCedPcJ8k/jD/+0lr7VJL3ZWib+fV8XWvt6gzt9YSqOiFDYv6q1to1GY4AfX9V/XqGD+BT5r9+CWcm+dYMnabLkvxVhk7H3ZO8N8m3j0cxfjnJS8b3vpRzkvxha+1gG45g/G5u+nJLhhELi9m2yPPbMxwdS4YjPh9YYJrvy9Aes2P7vXaReX2xtXbZ+PfHctN29p+T7K2qX8pwNO22mawdD/+g/3yST7TWvjK+78vHeS+1vh+W5KLW2vXjNv0/58z3kUl+elwnH80wpPY7l6jjvUnOqKp/k6ET8fwkDx/3jQdlOAKwlA/PWbeXZc6RlPnGWn8syQuq6juOMt8kSVXtyHCk5/zW2hkZ9r+Xjz9YgLVLv2Jx+hULW0v9iiOMBzQeleSeVfWcDEeyt2Q4cr5YfYs9/70Zwqm947xfk+FH+R0WWfZHk9xYVfccv/92tdbmXoNhqfkttr0s1d6LeVOSc8ag4YFjGzw8Qwjz+dbaPy7x2iT5s9bata21GzOEdQv1GSbtV13Sbhoh8ookc6//9PrW2o2ttWszjPb5xjacpvLjSR5fVf8twwjOSbbPrbmpL5kM6/zGY5iOJQgeSJKMH9oPypCg70vy4qpaKMmev+NtzfDD8IYMO+Fcc5O/aw7/UVU/m+RVSa7NkNr+8bzXXjevtoNHKf+EBWo6cYF6FjL3yOrW3PQhcs2c509I8kdzOlj3yHBU5qvjcucuZ6EjtTfMra+qblVHXpBq/ns4XM/hH93XZHGvyPBh+71JPtNau7yG880uyzBE7b0ZjgwsZH79hztVJ2QYfjm3Y3nfDB2xyzN0mH4jyS2TvLOqjhhef5T3N/e9LfX+3pfku6rqZp9V4+PvSnL4IpLXtYWPks/fLhf7kvjanL/ntskfJ3lKhuGLL84QSkyyXc3dhhfafo+2vhfbpk5I8tg56+Q+uflRvZsZ9+u3ZAhg7pNhWzk9wxDG948dyaXMrX3+trLQ8j6WIdz44yQ7jjLvZDi15eTW2lvG138wyafGWoF1TL/i66/Tr7ipnrXQr1hQDddV+HiGdfGxJM/M8B24ZbH6lqh7obbfksUPpCTj6J4MIx/+aN6/LTW/xbaXRdt7sQLacMrRLTKMiPxckj9P8ojx8cVL1H7YJH2GSftVi+1HCy6nqu6R4QDULTOMrvrNReY73//JEH4cdtsMoxlWOh1LEDyQJKmqu2VIJz/TWvuNDB8Gh4eN3ZCbPiz/IslTq2pLVW3P8OHxlxl+IJ5ZVXcd5/fDGYb5zf+gTIbU8jWttVclaRkS5hOOofx3JPmJ8UsjGYaa/fV45ORoDl+Z9x4ZhqcdcWXhDO/5x2o4Rz4ZUtRL5yz7KeM8bpdk/pDFZBg+/kNVdcvx8XMzDI+/IcN5olsyfFjeqarOHuf17Rl+WL/naG9g/LG2JcMRp1eMT98ryd4MPwIvyXCUIuPRi7n2jtOmhjs83PXwbDOcR/mE8d++OcP2cc+xg/fqDGn0fx7b5x5HKfMdSX62qraNocHPZdhujubiJPuTvKTGK1iP///dDJ2Kow2hfGuSJ1XV1hqu93BeFt4mF/M9SZ7XWnv9+Pg+ObZt9bCl1vfbk/x4Ve0YRwQ8bs7r/iLJf5yz/705SwQPozdmGKL5idba9RmOhvxG5pxmMcfcfX2lXpjhlI4nTDDtPyT5hqq6X5JU1bdmOMXk48dYAzBl+hX6FWu0X7GYb8vwo/WC1tqfZxgZsj1Dey5Y3xJ1vyPJjx4+zWY8dWdfhu+8xbwuw0GBx2UIz+a/18Xmt9j2smh7H6Ud3pThOiWXtNY+m+EU0cfnplFDc62kzzBpv+qhVXX41KqfyRCCLOW7knyktfaiDPvcoxeZ73z/Kze13xkZwraF7noy6XQsQfBAkmQcyvj/JvlIVX0kyU/kpnPH35zkN6rqiRm+fL8xw/DBT2T4YHtBa+2qDMOs/0dVfSzDB8sNGY4+zPfCDMPF/y7DULiPZUiMV+pVGb6EP1xVn8nwob/QuZcLuf9Y74UZLm7z1fkTtNYOJ6d/OdZ8XpIfGofl/VySu4zLfVWGowHzX/+2DF9M76uqT2S4GM6zMlyL4MMZjvAeyvCF87vjNBcleVJb+EJEC3lFhvMNDw9jvSRDEtsyXHTxdhk6A/Pb+flJHlHDbZWel/F6CuMP1HOT/NT4ni9J8uzW2vsynIt4QpJPV9VHM3wpvTRLe36GH6OXjfVsS/L0o72pcRTDIzKEDB8d6/zY+PjhExy1+o0MFyv8RIZt5J+y8Da5mF9J8qZxnbw8w5fZsWyrSZI23PlhsfX98gznTX5yXN7lc1768xmGfX4iyeFhvEc7x/adGZL5wx2yv8hw4aeFvsTfleR7qup3V/C2knx9uOqPZ7i4W5Kv37LziNM6Wmv/nOEc2d8Z2+HiJE9prX1+/rTA+qJfoV+RNdivGH1vVV0z578vZ/hOfUuSz45t/6gMFwu84xL1Lfh8a+0vMwRt76qqT2W4ZsMj25yLZc7XWvvK+D4+N277c/9tqfktuL0cpb2X8qYMgdnhPsNfJtnTWvvSAtPO3Y8nNWm/6stJ/mh8X3fIcL2YpfxxktuM0386Qz/x1KraVcNt1i+r4SKT8z0nySlju74zwzUrPp8kVfXKqvqZo03H5LYcOrScg3+wsDF1vyDJc9tw14F7ZDjafNvxi3TNqTlXh552LfRRVT+a5F9ba28bj4i8IUOK/wdHeSkAU6RfAZtTDXe1eExr7ZHTroXjy4gHjovxAkXXJ/mbGi5i8/IMt6hak50DNo1PJnnWuE1+MsPtJF853ZIAOBr9CoCNxYgHAAAAoBsjHgAAAIBuTpx2AcuwPcPVkPfEfVMBYK4TMtwq9W8y79aBHHf6IwBwpCX7IuspeLh3hisVAwALe2CS9067iA1OfwQAFrdgX2Q9BQ97kuSrX92f2dn1d12K3btPyb5910y7jDVHuxxJmyxMuxxJmyxsM7bL1q1bcutb70zG70q6WpP9kc243a9F1sPaYD2sDdbD2rBa6+FofZH1FDzcmCSzs4fW1Bf9cqzXunvTLkfSJgvTLkfSJgvbxO1i6H9/a7Y/stbq2aysh7XBelgbrIe1YZXXw4J9EReXBAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHRzYs+ZV9Wjkjwnyc4kl7TWnl5VD0vyoiQnJXl9a+2CnjUAAAAA09NtxENVfUuSP0zy6CR3TXKPqjonyYVJzk1y5yT3Hp8DAAAANqCep1r8YIYRDV9urR1M8rgk1yb5XGvt8tbaDUlel+SxHWsAAAAApqjnqRZ3THJ9Vb05ye2SvCXJp5LsmTPNniRnLGemu3efctwKXG0zM7umXcKapF2OpE0Wpl2OpE0Wpl0AANaOnsHDiUm+K8mDk1yT5M1Jvpbk0JxptiSZXc5M9+27JrOzh44+4RozM7Mre/dePe0y1hztciRtsjDtciRtsrDN2C5bt25Z18E8ALCx9Qwe/jHJO1tre5Okqt6U4bSKG+dMc1qSKzrWAAAAAExRz+DhLUleW1W3SnJ1knOSXJzkl6vqjkkuT3JehotNAgAAABtQt4tLttY+lOS3krw3yaeTfDHJHyQ5P8kbxuc+myGMAAAAADagniMe0lq7MEeOaLg0yd16LhcAAABYG3reThMAAADY5LqOeAA2rhuTXHdwWTelOSZXXnVtDqxgedu3bc0JHeoBYHNYzvfdpN9VvpuAzUbwAKzIdQdnc8kHv7Bqy9u5c3v2779u2a97xH3vkJO3GdwFwMos5/tu0u8q303AZuMTDwAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANDNidMuAACgp6p6VJLnJNmZ5JLW2tOr6mFJXpTkpCSvb61dMM0aAWAjM+IBANiwqupbkvxhkkcnuWuSe1TVOUkuTHJukjsnuff4HADQgeABANjIfjDDiIYvt9YOJnlckmuTfK61dnlr7YYkr0vy2GkWCQAbmVMtAICN7I5Jrq+qNye5XZK3JPlUkj1zptmT5IzlzHT37lOOW4HHy8zMrmmXsCFdedW12blz+8TTTzLtjh3bMnPqycdSFkdhf1gbrIe1YS2sB8EDALCRnZjku5I8OMk1Sd6c5GtJDs2ZZkuS2eXMdN++azI7e+joE66SmZld2bv36mmXsSEdODib/fuvm2janTu3TzTtgQMHra+O7A9rg/WwNqzWeti6dcuSobzgAQDYyP4xyTtba3uTpKrelOG0ihvnTHNakiumUBsAbAqCBwBgI3tLktdW1a2SXJ3knCQXJ/nlqrpjksuTnJfhYpMAQAcuLgkAbFittQ8l+a0k703y6SRfTPIHSc5P8obxuc9mCCMAgA6MeAAANrTW2oU5ckTDpUnuNoVyAGDTMeIBAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADo5sSeM6+qdyf5xiQHx6d+Osm3JrkgybYkL2mtvaxnDQAAAMD0dAseqmpLkjOT3L61dsP43Dcl+ZMk90xyXZL3V9W7W2uf7lUHAAAAMD09RzzU+P9Lqmp3klckuTrJu1prVyVJVV2c5DFJntexDgAAAGBKegYPt05yaZKnZTit4j1JXp9kz5xp9iQ5ezkz3b37lONU3uqbmdk17RLWJO1ypPXQJldedW127ty+qstcyfJ27NiWmVNP7lDN2rAetpVp0C4AAGtHt+ChtfaBJB84/LiqXpXkRUmeP2eyLUlmlzPfffuuyezsoeNS42qamdmVvXuvnnYZa452OdJ6aZMDB2ezf/91q7a8nTu3r2h5Bw4cXBftuRLrZVtZbZuxXbZu3bKug3kAYGPrdleLqnpAVT10zlNbknwhyelznjstyRW9agAAAACmq+epFrdK8ryqul+GUy2emOQJSV5XVTNJ9if54SRP6VgDAAAAMEXdRjy01t6S5K1JPp7ko0kubK29L8mzkrw7yWVJLmqtfbhXDQAAAMB09RzxkNbas5M8e95zFyW5qOdyAQAAgLWh24gHAAAAAMEDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABANydOuwAAgJ6q6t1JvjHJwfGpn07yrUkuSLItyUtaay+bUnkAsOEJHgCADauqtiQ5M8ntW2s3jM99U5I/SXLPJNcleX9Vvbu19unpVQoAG5fgAQDYyGr8/yVVtTvJK5JcneRdrbWrkqSqLk7ymCTPm06JALCxCR4AgI3s1kkuTfK0DKdVvCfJ65PsmTPNniRnL2emu3efcpzKO35mZnZNu4QN6cqrrs3Ondsnnn6SaXfs2JaZU08+lrI4CvvD2mA9rA1rYT0IHgCADau19oEkHzj8uKpeleRFSZ4/Z7ItSWaXM999+67J7Oyh41Lj8TAzsyt791497TI2pAMHZ7N//3UTTbtz5/aJpj1w4KD11ZH9YW2wHtaG1VoPW7duWTKUd1cLAGDDqqoHVNVD5zy1JckXkpw+57nTklyxmnUBwGZixAMAsJHdKsnzqup+GU61eGKSJyR5XVXNJNmf5IeTPGV6JQLAxmbEAwCwYbXW3pLkrUk+nuSjSS5srb0vybOSvDvJZUkuaq19eHpVAsDGZsQDALChtdaeneTZ8567KMlF06kIADYXIx4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANDNib0XUFUvTHKb1tr5VXVWklcmuWWSv07yM621G3rXAAAAAExH1xEPVfXQJE+c89Trkjy1tXZmki1Jntxz+QAAAMB0dQsequrUJC9I8l/Hx7dPclJr7YPjJK9J8theywcAAACmr+eIh5cneVaSr46Pb5tkz5x/35PkjI7LBwAAAKasyzUequqnknyptXZpVZ0/Pr01yaE5k21JMrvcee/efcqxFzglMzO7pl3CmqRdjrQe2uTKq67Nzp3bV3WZK1nejh3bMnPqyR2qWRvWw7YyDdoFAGDt6HVxycclOb2qLktyapJTMoQOp8+Z5rQkVyx3xvv2XZPZ2UNHn3CNmZnZlb17r552GWuOdjnSemmTAwdns3//dau2vJ07t69oeQcOHFwX7bkS62VbWW2bsV22bt2yroN5AGBj63KqRWvt4a2172itnZXkV5O8ubX2pCQHqur+42T/PsnbeywfAAAAWBu63tViAY9P8uKq+myGURAvXeXlAwAAAKuo16kWX9dae02GO1iktfa3Sc7uvUwAAABgbVjtEQ8AAADAJiJ4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDfd72oBADBtVfXCJLdprZ1fVWcleWWSWyb56yQ/01q7YaoFAsAGZsQDALChVdVDkzxxzlOvS/LU1tqZSbYkefJUCgOATULwAABsWFV1apIXJPmv4+PbJzmptfbBcZLXJHnsdKoDgM3BqRYAwEb28iTPSvLN4+PbJtkz59/3JDljuTPdvfuUY6/sOJuZ2TXtEjakK6+6Njt3bp94+kmm3bFjW2ZOPflYyuIo7A9rg/WwNqyF9SB4AAA2pKr6qSRfaq1dWlXnj09vTXJozmRbkswud9779l2T2dlDR59wlczM7MrevVdPu4wN6cDB2ezff91E0+7cuX2iaQ8cOGh9dWR/WBush7VhtdbD1q1blgzlBQ8AwEb1uCSnV9VlSU5NckqG0OH0OdOcluSKKdQGAJuGazwAABtSa+3hrbXvaK2dleRXk7y5tfakJAeq6v7jZP8+ydunViQAbAKCBwBgs3l8khdX1WczjIJ46ZTrAYANzakWAMCG11p7TYY7WKS19rdJzp5mPQCwmRjxAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALqZKHioqqdV1S17FwMAsBj9EQBYnyYd8XDXJH9fVa+sqnv1LAgAYBH6IwCwDk0UPLTWnpzk25J8JMnvV9XfVNVPVNWOrtUBAIz0RwBgfZr4Gg+ttauT/LuzyGgAAB73SURBVGmSi5LsTvJzSVpVPapTbQAAN6M/AgDrz6TXeHhoVb0+yd8nuVOSR7fW7pnkIUle3rE+AIAk+iMAsF6dOOF0L0vy+0me0lr7l8NPttY+X1Wv6FIZAMDN6Y8AwDq0nItL7mut/UtVnVZVv1BVW5OktfacfuUBAHyd/ggArEOTBg+/l+SR49+zSR6Y5CVdKgIAWJj+CACsQ5MGD/drrf1YkrTW/inJY5N8d7eqAACOpD8CAOvQpMHDtqq6xZzHk14bAgDgeNEfAYB1aNIv7Lcm+Yuq+qMkh5KcNz4HALBa9EcAYB2aNHh4Zob7ZJ+b5IYkb4zbVgEAq0t/BADWoYmCh9bajUleOv4HALDq9EcAYH2aKHioqkdnuGr0rZNsOfx8a+2WneoCALgZ/REAWJ8mPdXiN5P8YpKPZTinEgBgtemPAMA6NGnw8M+ttTd2rQQAYGn6IwCwDk16O80PVdU5XSsBAFia/ggArEOTjnj4viRPrarrk1yf4bzKQ86pBABWkf4IAKxDkwYPD+1aBQDA0emPAMA6NNGpFq21Lya5d5InJ9mb5H7jcwAAq0J/BADWp4mCh6r65SQ/m+RHkpyU5DlV9eyehQEAzKU/AgDr06QXl/zRDOdV7m+t7Uty3yTndasKAOBI+iMAsA5NGjwcbK1dd/hBa+2fkxzsUxIAwIL0RwBgHZr04pJfqqrvT3KoqrYneUYS51QCAKtJfwQA1qFJg4enJvmjJHdNsj/JB5M8vldRAAAL0B8BgHVoouChtXZFkodW1clJTmitXd23LACAm9MfAYD1aaLgoap+cd7jJElr7UUdagIAOIL+CACsT5OeavGdc/6+RZIHJbn0+JcDALAo/REAWIcmPdXiSXMfV9Vtk7yqS0UAAAvQHwGA9WnS22nezHiO5R2ObykAAJPTHwGA9WEl13jYkuReSf5pgtc9L8ljkhxK8qrW2ouq6mFJXpTkpCSvb61dsOyqAYBNZ6X9EQBgulZyjYdDSf5Pkmcu9YKqelCSh2S45dW2JJ+uqkuTXJjhnMwvJXlrVZ3TWnv7cgsHADadZfdHAIDpW9E1HiZ8zV9V1Xe31m6oqm8al3WrJJ9rrV2eJFX1uiSPTSJ4AACWtJL+CAAwfZOeavHuDEcWFtRae8gizx+sql9L8owkf5rktkn2zJlkT5IzJq42ye7dpyxn8jVlZmbXtEtYk7TLkdZDm1x51bXZuXP7qi5zJcvbsWNbZk49uUM1a8N62FamQbtsTCvtjwAA0zXpqRYfSXKXJP89yfVJfnx87Z8c7YWttedU1W8m+fMkZ+bmHYYtSWaXU/C+fddkdnbRPseaNTOzK3v3Xj3tMtYc7XKk9dImBw7OZv/+61ZteTt3bl/R8g4cOLgu2nMl1su2sto2Y7ts3bplXQfzy7Di/ggAMD2TBg8PSPKA1tqNSVJVf5Hkg621Nyz2gqq6U5IdrbXLWmvXVtUbM1xo8sY5k52W5IqVlQ4AbDLL7o8AANM3afAwk2RHkv3j411JjjZ2+VuS/FpVPSDDKIdzk7w8yW9X1R2TXJ7kvAwXmwQAOJqV9EcAgCmbNHi4KMkHx1ELW5L8SJLfWeoFrbW3VdXZST6eYZTDG1prf1JVe5O8IUPH4W1JLl5p8QDAprLs/ggAMH2T3tXiV6vq4xluj/m1JD/dWvurCV733CTPnffcpUnutuxKAYBNbaX9EQBgurYuY9qvJPlkkmdnuKATAMBq0x8BgHVmouChqp6U5NVJfinJNyT5X1X15J6FAQDMtdL+SFU9r6o+XVWfqqpfHJ97WFX9XVV9rqqe37dyANjcJh3x8LQk/y7Jv7bW/inJPZP8QreqAACOtOz+SFU9KMOpGXdNcq8kT6uqu2W4uPW5Se6c5N5VdU7PwgFgM5s0eLixtfavhx+01r6U5IY+JQEALGjZ/ZHxGhDf3Vq7Ick3Zri+1a2SfK61dvn4/OuSPLZf2QCwuU16V4urquqsDLfFTFU9PslV3aoCADjSivojrbWDVfVrSZ6R5E+T3DbJnjmT7ElyxnIK2b37lOVMvipmZnZNu4QN6cqrrs3Ondsnnn6SaXfs2JaZU90Jtif7w9pgPawNa2E9TBo8PD3DbS+/tar2ZLiS9LndqgIAONKK+yOttedU1W8m+fMkZ2YML0Zbkswup5B9+67J7Oyho0+4SmZmdmXv3qunXcaGdODgbPbvv26iaXfu3D7RtAcOHLS+OrI/rA3Ww9qwWuth69YtS4bykwYPJ2e4BeaZSU5I0lprB4+9PACAiS27P1JVd0qyo7V2WWvt2qp6Y5LHJLlxzmSnJbmiU80AsOlNGjz8z9banZN8pmcxAABLWEl/5FuS/FpVPSDDKIdzk7w8yW9X1R2TXJ7kvAwXmwQAOpg0ePi7qjovyXuTXHP4ydaa6zwAAKtl2f2R1trbqursJB/PMMrhDa21P6mqvUnekGRHkrdlOIUDAOhg0uDh3Bx5tedDGYY5AgCshhX1R1prz03y3HnPXZrhtA0AoLOJgofW2o7ehQAALEV/BADWp61L/WNV/fc5f9+mfzkAADenPwIA69uSwUOSe835+5KehQAALEJ/BADWsaMFD1sW+RsAYLXojwDAOna04GGuQ92qAACYjP4IAKwzR7u45NaqunWGowsnzPk7idtpAgCrQn8EANaxowUP35nk/+amL/d9c/7N7TQBgNWgPwIA69iSwUNrbTmnYgAAHHf6IwCwvvkiBwAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuTuw586p6TpIfGR++tbX2S1X1sCQvSnJSkte31i7oWQMAAAAwPd1GPIwBwyOS3D3JWUnuWVU/luTCJOcmuXOSe1fVOb1qAAAAAKar56kWe5L8p9ba9a21g0k+k+TMJJ9rrV3eWrshyeuSPLZjDQAAAMAUdTvVorX2qcN/V9W3ZTjl4nczBBKH7UlyxnLmu3v3KcelvmmYmdk17RLWJO1ypPXQJldedW127ty+qstcyfJ27NiWmVNP7lDN2rAetpVp0C4AAGtH12s8JElVfXuStyZ5ZpIbMox6OGxLktnlzG/fvmsyO3vo+BW4SmZmdmXv3qunXcaao12OtF7a5MDB2ezff92qLW/nzu0rWt6BAwfXRXuuxHrZVlbbZmyXrVu3rOtgHgDY2Lre1aKq7p/k0iS/3Fp7bZIvJzl9ziSnJbmiZw0AAADA9HQb8VBV35zkz5I8rrX2rvHpDw3/VHdMcnmS8zJcbBIAAADYgHqeavGMJDuSvKiqDj/3h0nOT/KG8d/eluTijjUAAJuYW3sDwPT1vLjk05M8fZF/vluv5QIAJEfc2vtQkneMt/b+zSQPSvKlJG+tqnNaa2+fXqUAsLF1vcYDAMAUubU3AKwB3e9qAQAwDb1u7Z2szdt7u41sH8u9ffQk0270Wz2vBfaHtcF6WBvWwnoQPAAAG9rxvrV3svZu770ZbyO7WpZz++hJb/28kW/1vBbYH9YG62FtWK31cLRbezvVAgDYsNzaGwCmz4gHAGBDcmtvAFgbBA8AwEbl1t4AsAYIHgCADcmtvQFgbXCNBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3ZzYewFVdcsk70/yyNbaF6rqYUlelOSkJK9vrV3QuwYAAABgOrqOeKiq+yR5b5Izx8cnJbkwyblJ7pzk3lV1Ts8aAAAAgOnpfarFk5P8XJIrxsdnJ/lca+3y1toNSV6X5LGdawAAAACmpOupFq21n0qSqjr81G2T7JkzyZ4kZyxnnrt3n3JcapuGmZld0y5hTdIuR1oPbXLlVddm587tq7rMlSxvx45tmTn15A7VrA3rYVuZBu3CXE77BIDp6n6Nh3m2Jjk05/GWJLPLmcG+fddkdvbQ0SdcY2ZmdmXv3qunXcaao12OtF7a5MDB2ezff92qLW/nzu0rWt6BAwfXRXuuxHrZVlbbZmyXrVu3rOtgvqfxtM9X5MjTPh+U5EtJ3lpV57TW3j69KgFgY1vtu1p8Ocnpcx6flptOwwAAON6c9gkAU7baIx4+lKSq6o5JLk9yXoajDgAAx12P0z6TtXnqp1OM+ljuqYWTTLvRTwNcC+wPa4P1sDashfWwqsFDa+1AVZ2f5A1JdiR5W5KLV7MGAGBTO+bTPpO1d+rnZjzFaLUs59TCSU8L3MinAa4F9oe1wXpYG1ZrPRzttM9VCR5aa3eY8/elSe62GssFAJjHaZ8AsMpW+1QLAIBpctonAKyy1b64JADA1LTWDiQ5P8Npn59O8tk47RMAujLiAQDY8Jz2CQDTY8QDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALoRPAAAAADdCB4AAACAbgQPAAAAQDeCBwAAAKAbwQMAAADQjeABAAAA6EbwAAAAAHRz4rQLAOhp69Ytufbg7LTLWNL2bVtzwrSLAACATgQPwIZ2/Q2zedeHvzjtMpb0iPveISdvMwANAICNSU8XAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0I3gAQAAAOhG8AAAAAB0I3gAAAAAuhE8AAAAAN0IHgAAAIBuBA8AAABAN4IHAAAAoBvBAwAAANCN4AEAAADoRvAAAAAAdCN4AAAAALo5cdoFAEe6Mcl1B2enXcaSDk27AAAAYF0QPMAadN3B2VzywS9Mu4wlPeTs20+7BAAAYB1wqgUAAADQjeABAAAA6EbwAAAAAHQjeAAAAAC6ETwAAAAA3birBQAbxo1Jrrzq2hxY47ej3b5ta06YdhEAAKtE8ADAhnHdwdm872Nfyf791027lCU94r53yMnbDDoEjp8bM3wGHk/rISRd6H0fawC9Ht43rDeCBwAAWOeuOzibSz74heM6z/UQki70vnfu3H5MAfR6eN+w3tijAAAAgG4EDwAAAEA3ggcAAACgG8EDAAAA0M2mv7hkjysAL+RYrq7ryrqwsW3duiXXruDzYbVvG3mLbVtz/Rq/TeWhaRcAAMARNn3w0OMKwAs5lqvrurIubGzX3zCbd334i8t+3bFetXu5HnL27VdU52p6yNm3n3YJsCLHeiBkfhDZ46BFj4M1PQLN4z3PHoHmSgPnpQheYWNb6WfwUgeqVvMA96YPHgAApu1YD4TMDyJ7HLTocbCmR6B5vOfZI9BcaeC8FMErbGwr/Qxe6kDVah7gdhgdAAAA6GYqIx6q6rwkFyTZluQlrbWXTaMOAGDz0h8BgNWx6iMequqbkrwgyQOSnJXkKVV1l9WuAwDYvPRHAGD1TGPEw8OSvKu1dlWSVNXFSR6T5HlHed0JyXAxnuPphBO25JSTtx3XeS7k5B3bsuXQyi4idMIJW477+15LNvJ7W6nV2i6PxYknbF3VGle6D612nSux0hqP5XNlJdZLW+5c5XZZieP9uT5nXm6CNLkN1R+Z/3nQo+/Q47upx+fK8Z7ncuY36efyenjfyfrogy60XR7r9+N6eN/rhXY8flb6GbzU/nA8t/Wj9UW2HDq0utfArar/kmRna+2C8fFPJTm7tfaUo7z0AUn+d+/6AGAde2CS9067iPVAfwQAuliwLzKNEQ9bc/M7/mxJMkkk+TcZ3sSeDHcTAQAGJyQ5PcN3JZPRHwGA42fJvsg0gocvZ/jCPuy0JFdM8Lrr4igOACzm89MuYJ3RHwGA42vRvsg0god3JnluVc0k2Z/kh5McbVgjAMDxpD8CAKtk1e9q0Vr7SpJnJXl3ksuSXNRa+/Bq1wEAbF76IwCwelb94pIAAADA5rHqIx4AAACAzUPwAAAAAHQjeAAAAAC6ETwAAAAA3QgeAAAAgG5OnHYBm0VVPTDJS5LcIsnlSZ7YWvvqdKuavqq6f5IXZ2iXfUl+orX2xelWtXZU1a8nubG19txp1zItVXVekguSbEvyktbay6Zc0ppRVbdM8v4kj2ytfWHK5UxdVT0nyY+MD9/aWvuladYDvVTV7ZL/v707DZKrKsM4/g8BFEQRFYEgiAp5wLgkREHZl5QfCBhFERFBQDYNImrcQbEKEUVB3AATgUhcQUFQZDPiAkYrAkZZHhPLja1Ko4KoSDDxwzkTmmFmenC6b7eZ51dF0ff26dvv9M3t8/bZLguApwMGDrZ9/6AymwHnA5sCK4E5thc2Heuaql3dJGkqMA94EvBD4FjbDzUe6BpuFOdhFvAhYAIl/z48+XfnjTZXkzQT+IztZzUZ33gxiutBwLnARsA9wGubvB4y4qE55wOH2H4+cCvwzh7H0y++BBxpe2p9/Kkex9MXJG0o6QvAO3odSy9J2hz4MLALMBU4WtJzextVf5C0I/BjYHKvY+kHkmYALwOmUf6tTJf0yt5GFdE1nwM+Z3tbYDFw0hBlTgcur/XrQcCXJU1sMMY11ijrpgXAcbYnU370HtVslGu+duehNs6fDcy0/UJgCXByD0Jdo402V5O0CfBxyvUQHTaK62ECcBlwWr0ebgLe02SMaXhozna2b5W0DrA5MO5bWyU9DjjR9pK6awmwZQ9D6iezgKXAJ3odSI/NABba/ovtfwAXA6/ucUz94ihgNnBXrwPpE3cD77D9oO0VwG3k+yTWQDWP2I3yfQhwAXDAEEUvAb5cHy8DHg9s0O34xokR6yZJzwTWs72o7rqAoc9RjE27HGEdYLbtO+t28szuGG2uNo8y+iS6o9152B74h+0r6/apQKOjiDPVoiG2V0h6PnAtsAJ4X49D6jnb/6b0CCBpLUor9KW9jKlf2P4igKSTexxKr02i/KAccDewQ49i6Su2jwQoo+bC9i0DjyVtQ5lysXPvIoromqcB97UM278beMbgQra/0bI5B7jJ9r0NxDcetKubhnr+UecoxmzE82B7OaUBDknrUXp3P91kgONE21xN0vHAjcAiolvanYetgXvqiOpplA6atzQXXhoeOk7SAZQ1C1rdbnuG7V8Cm0g6BvgasFPjAfbISJ+LpHWB+ZR/j6c2HlwPjfS59CKePrQWsKplewJlrnLEkCRNAb4DvNP20l7HEzEWw9QRS3nk9yKM8L0o6QTgGGD3zkY3rrWrm1J3NWNUn7OkDSkNEL+wPb+h2MaTEc+DpOcBrwL2Jg1w3dTuelgb2APYzfbiuo7cGcBhTQWYhocOs30RcFHrPkmPl/QK2wO9+QsYZ0Poh/pcACRtQJlvtByYVYdIjxvDfS6x2h3Ari3bm5KpBTGMuljtN4ATbH+11/FEjNUwOcU6wHJJE23/B9iMYb4XJX0MmElJNO/odrzjSLu66Q7KeRnu+eiMtjlCXWT1KmAh8LbmQhtX2p2HAyjXw2LKYvKTJP3IdutrYuzanYd7gKW2F9ftr/DwlL1GZI2HZqwAPitpet1+DWVRuCiNMMuAA+vUi4hW1wJ7S9pY0vqUFvMr27wmxiFJW1Cmar0ujQ6xJqsN9D8CDqy7DgW+O7hcHemwJ7BzGh06bsS6qd6d64HaGApwCEOcoxizEc9DXUz1cuDrtk+wPXikUHRGu+vhg7Yn14Vu9wHuSqNDV7TLmW8ANpb0wrq9H/DzJgPMiIcG2P6PpAOBz9cvwTuBI3scVs9JmkZZRPFW4MY6V/0u2/v0NLDoG7bvlPR+4PuUVvJ5tn/W47CiP82hLJ53Rsu6F+fYPqd3IUV0zZuB+ZJOBP5AuWsFko6lzPP9YP3vPuC6lmtiH9vpeR+j4eomSVcAH6g9igcDc+udFW4kd+3quHbnAdiCsqDe2pIGFtlbPLBGUnTGKK+H6LLRnId6t6+5kp5AGSFxSJMxTli1Ko1/EREREREREdEdmWoREREREREREV2ThoeIiIiIiIiI6Jo0PERERERERERE16ThISIiIiIiIiK6Jne1iIiIaEhd4f4GYF/bvxtF+S8CC21fULd3Bs6krFi9HDii3rovIiIiom+l4SGij0jaCvgN8MuW3ROAs2yfN8Zjfxu42PYFkm4G9rD9t2HKbghcYnuvsbzn/xDjYcCrbe/b5PtGNEHSjsBcYPIoyk4CzgX2Bha2PPUl4OW2l0g6gnKLvlldCDcixrHkI8lHIjotDQ8R/edftqcObEjaHPiVpMW2l3TiDVqPP4yNgB068V4RsdpRwGzgwoEdkg4FTqBMffw5MNv2A8DBwLcooxoGyj4OOLHle2AJ8JZmQo+IcSj5SER0TBoeIvqc7TslLQUmS9oeeCPwBOBe23tKeiPwZsoPl+XAcbZvrz2m84FJwO+Bpw8cU9IqYGPbf5b0XuANwEPAUuAw4HxgvdoTMR3YCTgdWB94kPLj58raI/CIeFre42hgP9v71e1tge8BW9b3O4YyXPwpwGm2z279uyVdB3zG9sWDtyVtB5wFPBWYCHzK9nmSNqixbwOspPyQO8b2yv/pw4/oINtHAkii/n8KpTFiJ9sPSPoIMAc4xfbptcwuLa//N7Cg7l8LOBm4tME/ISLGseQjyUcixiKLS0b0OUkvBbYGflp3TaEMS9xT0u6USnNX29OAjwGX1HKfBRbZngIcD2w7xLFfTqnYX2r7ecBvgeOAw3m4p+PJwMXAW22/oL7fAknPGhzPoMN/BdhF0qZ1+3BqAkH5sbVPjfnAGvdoP4+1azzvsT0d2B2YI+klwCuBJ9a4X1xf8uzRHjuiYXtSktJFNamexRDX6WCS1qVMuVgbOLWrEUZEVMlHHhVz8pGIxyAjHiL6z0DLPpRr9M/Awbb/WHtKl9i+rz4/k5IE3DDQiwpsJOkpwAxK7ym2l0lqnSc+YAZwke2/1nJvh9VzOwfsCCyz/dNa5hZJ1wN7AKsGxbOa7b9L+ibweklnUoaO72r7fkn7AjMlbQNMBTZ4DJ/PZOA5wHktf/N6wDTgSuDU2htxDfBJ28sew7EjmjQR+Lrt4wFqD9mI9XItcxmlN3GW7RVdjzIixqvkIyNLPhLxGKThIaL/PGJO5RDub3k8EbjQ9rth9fDrScBfKZXwhJayDw1xrIdqOerrn0zpUWg1sbVMtRawDmWY4/0Mby7weeA24Dbbv5X0DOAndf+PKb0FQy3eNDj+dVviuXfQvNNN6r4HJG1NSUL2Aq6VdLTty0eIMaJXrqP0jp0C/Ak4m7KY28kjvGYBsAw4NkN2I6LLko8UyUciOiBTLSL+v10FHCRps7p9LGXeIpTW9qMBJG1JGdY92LXA/vUWf1B+8LydkgBMlDSBUilvK2mHeqwpwG6UH00jsr2IUll/gFLpA7yI8iPrFOBqaiUvaeKgl/+plkXSc4EXDBwW+Jek19fntgB+BUyX9CbK8Mmra/JzFbB9uzgjesH2L4APUe5acQsliT1tuPKSplGmY+wM3CjpZklXNBFrREQbyUeSj0SMKCMeIv6P2b5a0keBayStBO4D9re9StJs4HxJtwF3ADcP8foraiV6fR0meAtlvuM/gZ/V7V2BA4BPS1qfskjS4bZ/LWmnUYQ5FziJhxfBuxo4glJhrwR+QKnUtx70ulOA+ZJmArcDP6wxPyhpFnCWpHdRejpOsn19HRK6B3CrpH8Cf6DcbjCib9jequXxPGDeCGUPa3l8E4/sdYuI6AvJR5KPRLQzYdWqwSOWIiIiIiIiIiI6I1MtIiIiIiIiIqJr0vAQEREREREREV2ThoeIiIiIiIiI6Jo0PERERERERERE16ThISIiIiIiIiK6Jg0PEREREREREdE1aXiIiIiIiIiIiK75L6uKUhivTWngAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.linear_model import Lasso\n", "\n", "# some values you can try out: 0.00001, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 20\n", "alpha = 0.01\n", "lasso_model = Lasso(alpha=alpha, max_iter = 1000).fit(X_train_df_N, y_train)\n", "\n", "print('R squared score for our original OLS model: {}'.format(r2_val[-1]))\n", "print('R squared score for Lasso with alpha={}: {}'.format(alpha, lasso_model.score(X_val_df_N,y_val)))\n", "\n", "fig, ax = plt.subplots(figsize=(18,8), ncols=2)\n", "ax = ax.ravel()\n", "ax[0].hist(model_N.params, bins=10, alpha=0.5)\n", "ax[0].set_title('Histogram of predictor values for Original model with N: {}'.format(N))\n", "ax[0].set_xlabel('Predictor values')\n", "ax[0].set_ylabel('Frequency')\n", "\n", "ax[1].hist(lasso_model.coef_.flatten(), bins=20, alpha=0.5)\n", "ax[1].set_title('Histogram of predictor values for Lasso Model with alpha: {}'.format(alpha))\n", "ax[1].set_xlabel('Predictor values')\n", "ax[1].set_ylabel('Frequency');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Selection and Cross-Validation\n", "\n", "Here's our current setup so far: \n", "\n", "\n", "\n", "So we try out 10,000 different models on our validation set and pick the one that's the best? No! **Since we could also be overfitting the validation set!** \n", "\n", "One solution to the problems raised by using a single validation set is to evaluate each model on multiple validation sets and average the validation performance. This is the essence of cross-validation!\n", "\n", "\n", "\n", "Image source: [here](https://medium.com/@sebastiannorena/some-model-tuning-methods-bfef3e6544f0)\n", "\n", "Let's give this a try using [RidgeCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html) and [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html):" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R^2 score for our original OLS model: -1.8608470610311345\n", "\n", "Best alpha for ridge: 1000.0\n", "R^2 score for Ridge with alpha=1000.0: 0.5779474940635888\n", "\n", "Best alpha for lasso: 0.01\n", "R squared score for Lasso with alpha=0.01: 0.5975930359800542\n" ] } ], "source": [ "from sklearn.linear_model import RidgeCV\n", "from sklearn.linear_model import LassoCV\n", "\n", "alphas = (0.001, 0.01, 0.1, 10, 100, 1000, 10000)\n", "\n", "# Let us do k-fold cross validation \n", "k = 4\n", "fitted_ridge = RidgeCV(alphas=alphas).fit(X_train_df_N, y_train)\n", "fitted_lasso = LassoCV(alphas=alphas).fit(X_train_df_N, y_train)\n", "\n", "print('R^2 score for our original OLS model: {}\\n'.format(r2_val[-1]))\n", "\n", "ridge_a = fitted_ridge.alpha_\n", "print('Best alpha for ridge: {}'.format(ridge_a))\n", "print('R^2 score for Ridge with alpha={}: {}\\n'.format(ridge_a, fitted_ridge.score(X_val_df_N,y_val)))\n", "\n", "lasso_a = fitted_lasso.alpha_\n", "print('Best alpha for lasso: {}'.format(lasso_a))\n", "print('R squared score for Lasso with alpha={}: {}'.format(lasso_a, fitted_lasso.score(X_val_df_N,y_val)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also look at the coefficients of our CV models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Final Step:** report the score on the test set for the model you have chosen to be the best." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----------------\n", "### End of Standard Section\n", "---------------" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }