{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: Linear and Polynomial Regression with Residual Analysis\n",
"\n",
"## Description :\n",
"The goal of this exercise is to fit linear regression and polynomial regression to the given data. Plot the fit curves of both the models along with the data and observe what the residuals tell us about the two fits. \n",
"\n",
"\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"- Read the poly.csv file into a dataframe.\n",
"- Split the data into train and test subsets.\n",
"- Fit a linear regression model on the entire data, using `LinearRegression()` object from Sklearn library.\n",
"- Guesstimate the degree of the polynomial which would best fit the data.\n",
"- Fit a polynomial regression model on the computed Polynomial Features using `LinearRegression()` object from sklearn library.\n",
"- Plot the linear and polynomial model predictions along with the test data.\n",
"- Compute the polynomial and linear model residuals using the formula below $\\epsilon = y_i - \\hat{y}$\n",
"- Plot the histogram of the residuals and comment on your choice of the polynomial degree. \n",
"\n",
"## Hints: \n",
"\n",
"pd.DataFrame.head()\n",
"Returns a pandas dataframe containing the data and labels from the file data.\n",
"\n",
"sklearn.model_selection.train_test_split()\n",
"Splits the data into random train and test subsets.\n",
"\n",
"plt.subplots()\n",
"Create a figure and a set of subplots.\n",
"\n",
"sklearn.preprocessing.PolynomialFeatures()\n",
"Generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.\n",
"\n",
"sklearn.preprocessing.StandardScaler.fit_transform()\n",
"Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.\n",
"\n",
"sklearn.linear_model.LinearRegression\n",
"LinearRegression fits a linear model.\n",
"\n",
"sklearn.linear_model.LinearRegression.fit()\n",
"Fits the linear model to the training data.\n",
"\n",
"sklearn.linear_model.LinearRegression.predict()\n",
"Predict using the linear model.\n",
"\n",
"plt.plot()\n",
"Plots x versus y as lines and/or markers.\n",
"\n",
"plt.axvline()\n",
"Add a vertical line across the axes.\n",
"\n",
"ax.hist()\n",
"Plots a histogram.\n",
"\n",
"**Note:** This exercise is auto-graded and you can try multiple attempts. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"%matplotlib inline\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | x | \n", "y | \n", "
---|---|---|
0 | \n", "-3.292157 | \n", "-46.916988 | \n", "
1 | \n", "0.799528 | \n", "-3.941553 | \n", "
2 | \n", "-0.936214 | \n", "-2.800522 | \n", "
3 | \n", "-4.722680 | \n", "-103.030914 | \n", "
4 | \n", "-3.602674 | \n", "-54.020819 | \n", "