{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Title :\n",
    "Exercise - Logistic Regression\n",
    "\n",
    "## Description :\n",
    "\n",
    "Fit logistic regression models using:\n",
    "\n",
    "SKLearn <a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html\" target=\"_blank\">LogisticRegression</a> (sklearn.linear_model.LogisticRegression)\n",
    "\n",
    "Statsmodels <a href=\"https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Logit.html\" target=\"_blank\">Logit</a> (statsmodels.api.Logit)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import libraries\n",
    "\n",
    "import pandas as pd \n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LogisticRegression, LinearRegression\n",
    "import statsmodels.api as sm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "heart = pd.read_csv('Heart.csv')\n",
    "\n",
    "# Force the response into a binary indicator:\n",
    "heart['AHD'] = 1*(heart['AHD'] == \"Yes\")\n",
    "\n",
    "heart.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make a plot of the response (AHD) vs the predictor (Age)\n",
    "\n",
    "plt.plot(heart[['Age']].values, heart['AHD'].values ,'o', markersize=7,color=\"#011DAD\",label=\"Data\")\n",
    "\n",
    "plt.xticks(np.arange(18, 80, 4.0))\n",
    "plt.xlabel(\"Age\")\n",
    "plt.ylabel(\"AHD\")\n",
    "plt.yticks((0,1), labels=('No', 'Yes'))\n",
    "\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# split into train and validation\n",
    "heart_train, heart_val = train_test_split(heart, train_size = 0.75, random_state = 5)\n",
    "\n",
    "# select variables for model estimation\n",
    "x_train = heart_train[['Age']]\n",
    "y_train = heart_train['AHD']\n",
    "\n",
    "x_val = heart_val[['Age']]\n",
    "y_val = heart_val['AHD']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple linear regression model fitting\n",
    "\n",
    "Define and fit a linear regression model to predict `Age` from `MaxHR`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a linear regression model, with random state=5\n",
    "\n",
    "regress1 = LinearRegression(fit_intercept=True).fit(x_train,y_train)\n",
    "\n",
    "print(\"Linear Regression Estimated Betas:\",regress1.intercept_,regress1.coef_[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot the estimated probability for training data\n",
    "dummy_x=np.linspace(np.min(x_train)-30,np.max(x_train)+30)\n",
    "yhat_regress = regress1.predict(dummy_x.reshape(-1,1))\n",
    "plt.plot(x_train, y_train, 'o' ,alpha=0.2, label='Data')\n",
    "plt.plot(dummy_x, yhat_regress, label = \"OLS\")\n",
    "\n",
    "plt.ylim(-0.2, 1.2)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What could go wrong with this linear regression model? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple logisitc regression model fitting\n",
    "\n",
    "Define and fit a logistic regression model with random state=5 to predict `Age` from `MaxHR`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_logit1) ###\n",
    "# Create a logistic regression model, with random state=5 and no penalty\n",
    "\n",
    "logit1 = ___(penalty=___, max_iter = 1000, random_state=5)\n",
    "\n",
    "#Fit the model using the training set\n",
    "\n",
    "logit1.fit(x_train,y_train)\n",
    "\n",
    "# Get the coefficient estimates\n",
    "\n",
    "print(\"Logistic Regression Estimated Betas (B0,B1):\",logit1.intercept_,logit1.coef_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interpret the Coefficient Estimates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculate the estimated probability that a person with age 60 will have AHD in the ICU."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**your answer here**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Confirm the probability calculation above using logit1.predict()\n",
    "# Be careful as to how you define the new observation.  Hint: double brackets is one way to do it\n",
    "\n",
    "logit1.predict_proba([[___]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Accuracy computation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_accuracy) ###\n",
    "\n",
    "# Compute the training & validation accuracy \n",
    "\n",
    "train_accuracy = logit1.___(x_train , y_train)\n",
    "val_accuracy = logit1.___(x_val , y_val)\n",
    "\n",
    "# Print the two accuracies below\n",
    "\n",
    "print(\"Train Accuracy\", train_accuracy)\n",
    "print(\"Validation Accuracy\", val_accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot the predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "x=np.linspace(np.min(heart[['Age']])-10,np.max(heart[['Age']])+10,200)\n",
    "\n",
    "yhat_class_logit = logit1.predict(x)\n",
    "yhat_prob_logit = logit1.predict_proba(x)[:,1]\n",
    "\n",
    "# plot the observed data\n",
    "plt.plot(x_train, y_train, 'o' ,alpha=0.1, label='Train Data')\n",
    "plt.plot(x_val, 0.94*y_val+0.03, 'o' ,alpha=0.1, label='Validation Data')\n",
    "\n",
    "# plot the predictions\n",
    "plt.plot(x, yhat_class_logit, label='logit1 Classifications')\n",
    "plt.plot(x, yhat_prob_logit, label='logit1 Probabilities')\n",
    "\n",
    "# put the lower-left part of the legend 5% to the right along the x-axis, and 45% up along the y-axis\n",
    "plt.legend(loc=(0.05,0.45))\n",
    "\n",
    "# Don't forget your axis labels!\n",
    "plt.xlabel(\"Age\")\n",
    "plt.ylabel(\"Heart disease (AHD)\")\n",
    "\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Statistical Inference\n",
    "Train a new logistic regression model using statsmodels package. Print model summary and interpret the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {},
   "outputs": [],
   "source": [
    "### edTest(test_logit2) ###\n",
    "# adding a column of ones to X\n",
    "x_train_with_constant = sm.add_constant(x_train)\n",
    "x_val_with_constant = sm.add_constant(x_val)\n",
    "\n",
    "# train a new model using statsmodels package\n",
    "logreg = sm.___(y_train, x_train_with_constant).fit()\n",
    "print(logreg.summary())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is an estimated 95% confidence interval for the coefficient corresponding to 'Age' variable?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*your answer here*"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}