{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Title :\n",
    "Exercise: Simple Multi-linear Regression\n",
    "\n",
    "## Description :\n",
    "The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change. \n",
    "\n",
    "<img src=\"../fig/fig1.png\" style=\"width: 500px;\">\n",
    "\n",
    "## Data Description:\n",
    "\n",
    "## Instructions:\n",
    "\n",
    "- Read the file `Advertisement.csv` as a dataframe.\n",
    "- For each instance of the predictor combination, form a model. For example, if you have 2 predictors,  A and B, you will end up getting 3 models - one with only A, one with only B, and one with both A and B.\n",
    "- Split the data into train and test sets.\n",
    "- Compute the MSE of each model.\n",
    "- Print the Predictor - MSE value pair\n",
    "\n",
    "## Hints: \n",
    "\n",
    "<a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html\" target=\"_blank\">pd.read_csv(filename)</a>\n",
    "Returns a pandas dataframe containing the data and labels from the file data.\n",
    "\n",
    "<a href=\"http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html\" target=\"_blank\">sklearn.preprocessing.normalize()</a>\n",
    "Scales input vectors individually to unit norm (vector length).\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\" target=\"_blank\">sklearn.model_selection.train_test_split()</a>\n",
    "Splits the data into random train and test subsets.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html\" target=\"_blank\">sklearn.linear_model.LinearRegression</a>\n",
    "LinearRegression fits a linear model.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit\" target=\"_blank\">sklearn.linear_model.LinearRegression.fit()</a>\n",
    "Fits the linear model to the training data.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict\" target=\"_blank\">sklearn.linear_model.LinearRegression.predict()</a>\n",
    "Predict using the linear model.\n",
    "\n",
    "<a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error\" target=\"_blank\">sklearn.metrics.mean_squared_error()</a>\n",
    "Computes the mean squared error regression loss\n",
    "\n",
    "**Note:** This exercise is auto-graded and you can try multiple attempts. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn import preprocessing\n",
    "from prettytable import PrettyTable\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.model_selection import train_test_split\n",
    "%matplotlib inline\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the file \"Advertising.csv\"\n",
    "df = pd.read_csv(\"Advertising.csv\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TV</th>\n",
       "      <th>Radio</th>\n",
       "      <th>Newspaper</th>\n",
       "      <th>Sales</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>230.1</td>\n",
       "      <td>37.8</td>\n",
       "      <td>69.2</td>\n",
       "      <td>22.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44.5</td>\n",
       "      <td>39.3</td>\n",
       "      <td>45.1</td>\n",
       "      <td>10.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>17.2</td>\n",
       "      <td>45.9</td>\n",
       "      <td>69.3</td>\n",
       "      <td>9.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>151.5</td>\n",
       "      <td>41.3</td>\n",
       "      <td>58.5</td>\n",
       "      <td>18.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>180.8</td>\n",
       "      <td>10.8</td>\n",
       "      <td>58.4</td>\n",
       "      <td>12.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      TV  Radio  Newspaper  Sales\n",
       "0  230.1   37.8       69.2   22.1\n",
       "1   44.5   39.3       45.1   10.4\n",
       "2   17.2   45.9       69.3    9.3\n",
       "3  151.5   41.3       58.5   18.5\n",
       "4  180.8   10.8       58.4   12.9"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Take a quick look at the data to list all the predictors\n",
    "df.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create different multi predictor models "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_mse) ###\n",
    "\n",
    "# Initialize a list to store the MSE values\n",
    "mse_list = []\n",
    "\n",
    "# List of all predictor combinations to fit the curve\n",
    "cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]\n",
    "\n",
    "# Loop over all the predictor combinations \n",
    "for i in cols:\n",
    "\n",
    "    # Set each of the predictors from the previous list as x\n",
    "    x = df[i]\n",
    "    \n",
    "    \n",
    "    # Set the \"Sales\" column as the reponse variable\n",
    "    y = df[\"Sales\"]\n",
    "    \n",
    "   \n",
    "    # Split the data into train-test sets with 80% training data and 20% testing data. \n",
    "    # Set random_state as 0\n",
    "    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)\n",
    "\n",
    "    # Initialize a Linear Regression model\n",
    "    lreg = LinearRegression()\n",
    "\n",
    "    # Fit the linear model on the train data\n",
    "    lreg.fit(x_train, y_train)\n",
    "    \n",
    "    # Predict the response variable for the test set using the trained model\n",
    "    y_pred= lreg.predict(x_test)\n",
    "    \n",
    "    # Compute the MSE for the test data\n",
    "    MSE = mean_squared_error(y_test, y_pred)\n",
    "    \n",
    "    # Append the computed MSE to the list\n",
    "    mse_list.append(MSE)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display the MSE with predictor combinations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+------------------------------+-------------------+\n",
      "|          Predictors          |        MSE        |\n",
      "+------------------------------+-------------------+\n",
      "|            ['TV']            | 10.18618193453022 |\n",
      "|          ['Radio']           | 24.23723303713214 |\n",
      "|        ['Newspaper']         | 32.13714634300907 |\n",
      "|       ['TV', 'Radio']        | 4.391429763581883 |\n",
      "|     ['TV', 'Newspaper']      | 8.687682675690592 |\n",
      "|    ['Radio', 'Newspaper']    | 24.78339548293816 |\n",
      "| ['TV', 'Radio', 'Newspaper'] | 4.402118291449686 |\n",
      "+------------------------------+-------------------+\n"
     ]
    }
   ],
   "source": [
    "# Helper code to display the MSE for each predictor combination\n",
    "t = PrettyTable(['Predictors', 'MSE'])\n",
    "\n",
    "for i in range(len(mse_list)):\n",
    "    t.add_row([cols[i],mse_list[i]])\n",
    "\n",
    "print(t)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}