Homework 3

Due Date: Thursday, October 10th at 11:59 PM

There are 3 problem in this homework.

Problem 0 Homework Workflow [10pts]
Problem 1 Using Git Revert [5pts]
Problem 2 Writing a Linear regression class using Object Oriented Programming (OOP) paradigm [45pts]
Problem 3 Writing Bank Account software using Object Oriented Programming (OOP) paradigm [40pts]


Problem 0: Course Workflow [10pts]

Once you receive HW2 feedback (no later than Friday Oct 4), you will need to merge your HW2-dev branch into master. This counts for points so ensure you do this.

You will be rewarded direct points for following all stages of the git workflow which involves

  • 3pts for merging your HW2-dev into master
  • 5pts for completing HW3 on the HW3-dev branch
  • 2pts for making a PR on HW3-dev to merge into master

Problem 1: Using Git Revert [5pts]

You have used the git revert command a few times throughout this class. This problem will teach you how to revert multiple commits all in one shot, which can be very useful when you have made many changes that broke your code or accidentally merged a branch you did not want to.

  1. Start on your HW3-dev branch in your HW3/ directory.
    1. Make a file called hello.txt with one line that says This is your_name. Try using the echo command.
    2. Create a commit for hello.txt with the message made hello.txt.
  2. Off of HW3-dev, create and switch to a new branch called HW3-P1.
    1. You should see that hello.txt exists on HW3-P1.
    2. Change your_name in hello.txt to the name of your favorite celebrity or athlete. So now the file should read This is your_favorite_celebrity.
    3. Commit this change with a message that says P1 - Changed name to your_favorite_celebrity.
    4. Delete hello.txt and commit with a message that says P1 - deleted hello.txt.
    5. Verify that hello.txt is deleted and that your HW3-P1 branch is 2 commits ahead of HW3-dev.
  3. Switch back to your HW3-dev branch.
    1. Verify that your hello.txt says This is your_name. Why is this the case? Make sure you understand this. You do not need to submit an answer.
    2. Merge your HW3-P1 branch into HW3-dev.
    3. Verify that your hello.txt is now gone. If hello.txt was the only file in your HW3/ directory, then you may find that HW3/ is gone as well. This is expected.

You made some changes to your hello.txt on our HW3-P1 branch and merged those changes into your HW3-dev. But what if you decide that you did not want to delete hello.txt or change your name in hello.txt? This is where the git revert command is useful to reverse our history while still preserving it!

  1. Verify that you're on your HW3-dev branch.
  2. Find the SHA for the commit that you made when you changed the name in hello.txt. If you are unfamiliar with git commit SHAs, this is a quick read to get you up-to-speed.
  3. Find the SHA for the commit that you made when you deleted hello.txt.
  4. Now, we want you to use the git revert command to reverse the changes of your two commits and then make only one additional commit with the reverted changes.
    1. Explore the git revert command (use the -h flag or Google) to figure out a way you can revert the changes of your two previous commits and then commit the revisions in a single commit.
    2. The message for your single commit should read something like P1 - reverted deletion of hello.txt and name change.
    3. If you use git revert older_commit_SHA^..newer_commit_SHA, then you will make two additional commits to revert your changes. This IS NOT what we want.
    4. Hint: Checkout this stackoverflow thread.
  5. Your commit log on your HW3-dev branch should have a sequence of 4 commits very close to the screenshot below. Please take a screen shot of your 4 commits, name the screenshot P1.png, and submit it.

Final deliverables:
P1.png


Problem 2: Linear Regression Class [45pts]

Explanation of Modules

So far, we have only been writing short Python scripts. However, when your code base starts to get bigger, you might want to organize your function and class definitions. The idea behind modules is to split your function and class definitions into multiple, logical units. When you want to use a function or class you simply import it from the module. In essence, a module is a file containing Python definitions and statements.

In this problem, you will create a module called Regression with custom Python classes for two related types of linear regression: Ordinary Least Squares Linear Regression and Ridge Regression.

You are prohibited from using standard regression libraries in Python such as sklearn. These classes must be your own. However, you are permitted to check your answers against the standard libraries. You may use numpy to perform simply computations such as computing averages.

Background

Consider the multivariate linear model: $$y = X\beta$$ where $y$ is a length $n$ vector, $X$ is an $n \times p$ matrix, and $\beta$ is a $p$ length vector of coefficients.

The goal is to find the coefficients $\beta$ so that the linear model fits the data the best. There are many approaches to this, but in this problem you will only consider two.

Ordinary Least Squares (OLS) Linear Regression

OLS Regression seeks to minimize the following cost function:

$$\|y - X\beta\|^{2}.$$

The best fit coefficients are given by:

$$\widehat{\beta} = (X^T X)^{-1}X^Ty$$

where $X^T$ is the transpose of the matrix $X$ and $X^{-1}$ is the inverse of the matrix $X$. Note that these are the coefficients that minimize the cost function given the data.

Ridge Regression

Ridge Regression introduces an $L_{2}$ regularization so the new cost function is:

$$\|y - X\beta\|^{2}+\|\Gamma \beta \|^{2}.$$

where $\Gamma = \alpha I$ for some constant $\alpha$ and $I$ is the identity matrix.

The best fit coefficients for this case are given by: $$\hat{\beta} = (X^T X+\Gamma^T\Gamma)^{-1}X^Ty.$$

$R^2$ score

You will use the $R^{2}$ metric to assess the performance of the models. The $R^2$ score is defined as: $$\displaystyle R^{2} = 1-\dfrac{SS_{E}}{SS_{T}}$$ where $$SS_{T}=\sum_{i}{\left(y_{i}-\overline{y}\right)^2}$$ and $$SS_{E}=\sum_{i}{\left(y_{i} - \widehat{y_i}\right)^2}.$$

The ${y_i}$ are the original data values, $\overline{y}$ is the mean of the original data values, and $\widehat{y_i}$ are the values predicted my the model.

Part A: Base Class [10pts]

In a file called Regression.py, write a class called Regression with the following methods:

__init__(): Initializes an empty dictionary called params. Note that params should be a class attribute.

fit(X, y): Fits a linear model to $X$ and $y$. Stores best-fit parameters in the dictionary attribute called params. The first key should be the coefficients (not including the intercept) and the second key should be the intercept.

get_params(): Returns $\widehat{\beta}$ for the fitted model. Note that the fit method already stored the dictionary in params, so all you need to do is return that dictionary.

predict(X): Predict new values with the fitted model given $X$.

score(X, y): Returns the $R^2$ value of the fitted model.

set_params(): Manually set parameters of the linear model. The method should accept variable keyword arguments (**kwargs) containing model parameters. In this problem, it will be used to set the reguarization coefficient $\alpha$ in the Ridge Regression model.

This parent class should throw a NotImplementedError for methods that are intended to be implemented by subclasses.

Here is the interface for the Regression class:

In [ ]:
class Regression():
    
    def __init__(self):
        # your code
    
    def get_params(self):
        # your code
    
    def set_params(self, **kwargs):
        # your code
        
    def fit(self, X, y):
        # your code
        
    def predict(self, X):
        # your code
        
    def score(self, X, y):
        # your code

Part B: List every function definition inside the module Regression [5pts]

In a file called P1B.py, import the Regression class using an alias and print a list of every function that can be accessed through this class by using a built-in python function. The list should print to your Terminal screen. Ensure you only print the functions for the Regression class from the module.

Part C: OLS Linear Regression [10pts]

Write a class called LinearRegression that implements the OLS Regression model described above and inherits the Regression class. Also place this class in Regression.py.

Hints:

  • Note that the linear model $X\beta$ can also include an intercept term (e.g. $\displaystyle \beta_{1} x_{1} + \beta_{0}$). This is handled by appending a column of ones to the feature matrix $X$. See the numpy.append documentation. You may want to consider doing the append inside your fit method.
  • The best-fit coefficients $\widehat{\beta}$ are determined by forming the inverse of $\displaystyle X^{T}X$. Rather than using the numpy.linalg.inv method, it would be better to use the pseudo-inverse.

Part D: Ridge Regression [10pts]

Write a class called RidgeRegression that implements Ridge Regression and inherits the LinearRegression class. Place this class in Regression.py.

Part E: Model Scoring [5pts]

You will use the Boston dataset for this part. You will want to split this dataset into a training and a test set. Please use an 80%-20% training-test split with a random_state=42, as seen below.

Import your Regression module using an alias. Instantiate your LinearRegression and RidgeRegression models. Using a for loop, fit (on the training data) and score (on the testing data) each model on the Boston dataset. Place this code in a file called model_scoring.py.

Note: Some of you may not be familiar with the train-test split pattern from the statistics world. All this means is that you take your dataset and split it into two parts, a training part and test part (often 80-20 split). You perform the analysis on the training data in order to determine the best-fit parameters in your model. Then, you use that model to make a prediction using data from the test set. Finally, you assess the performance of the model on the test set.

Print out the $R^2$ value for each model and the parameters for the best model using the get_params() method. Use an $\alpha$ value of 0.1.

Hint: The code below demonstrates how to do a train-test split. It also demos the way a user should interact with your classes. There are pieces of code that are missing, so you will need to fix this. Note that the demo uses the diabetes dataset, which you should change to the Boston dataset.

In [ ]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
#import regression classes

dataset = datasets.load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(dataset['data'], 
                                                    dataset['target'], 
                                                    test_size=0.2, 
                                                    random_state=42)

alpha = 0.5
models = [model1(alpha), model2(alpha)]

for model in models:
    model.fit(X_train, y_train);

Part F: Visualize Model Performance [5pts]

Evaluate how the models perform for various values of $\alpha$. Place this code in a file called model_performance.py.

  • Plot $R^2$ versus $\alpha$.
  • Calculate the $R^2$ scores for each model .
  • Use at least 10 values for $\alpha \in [0.05, 1]$ for the Ridge regression model.
  • Plot the two lines on the same graph.
  • To change the parameters, use the set_params() method that you wrote.
  • Save the plot as P1F.png.

Be sure to create a readable and interpretable plot!. This means axes labels should be present and readable and the plot should have a legend.

Final Deliverables:

  1. Regression.py
  2. P1B.py
  3. model_scoring.py
  4. model_performance.py
  5. P1F.png

Problem 3: Bank Account Revisited [40pts]

We are going to redo the bank account closure problem from HW2, only this time developing a formal class for a Bank User and Bank Account to use in our closure (recall previously we just had a nonlocal variable amount that we changed).

IMPORTANT

We will be grading this problem with a test suite. Put the enum, classes, and closure in a single file called Bank.py.

It is very important that the class and method specifications we provide in the problem description are used (with the same capitalization), otherwise you will receive no credit.

Some Preliminaries:

Open a Jupyter notebook and try the following code. This does not need to be submitted and will not be graded. This is just to get you warmed up with some Enum types.

First define two types of bank accounts.

In [ ]:
from enum import Enum
class AccountType(Enum):
    SAVINGS = 1
    CHECKING = 2

Now, just to orient you, let's expore this class a little bit.

In [ ]:
AccountType.SAVINGS

returns a Python representation of an enumeration.

You can compare these account types:

In [ ]:
AccountType.SAVINGS == AccountType.SAVINGS
In [ ]:
AccountType.SAVINGS == AccountType.CHECKING

Note that to get a string representation of an Enum, you can use:

In [ ]:
AccountType.SAVINGS.name

Part A: Create a BankAccount class:

Constructor is BankAccount(self, owner, accountType) where owner is a string representing the name of the account owner and accountType is one of the AccountType enums.

Methods to modify the account balance of the account:

  • withdraw(self, amount)
  • deposit(self, amount)
    Note: You should raise an error or exception with these methods in 2 situations and provide informative error messages. The 2 situations you should handle are:
    1. You should not be able to withdraw more money than the balance of the account.
    2. You should not be able to withdraw or deposit a negative amount.

Override the following methods:

  • __str__ to write an informative string of the account owner and the type of account.
    • Remember: This is different than __repr__.
  • __len__ to return the balance of the account.

Put this class in your Bank.py file.

Part B: Write a BankUser class with the following specification:

Constructor BankUser(self, owner) where owner is a String that represents the name of the account owner.

Method addAccount(self, accountType):

  • To start, a user will have no accounts when the BankUser object is created.
  • addAccount will add a new BankAccount account to the user based on the accountType specified.
  • Only one savings and checking account per user. Raise an appropriate error otherwise.

Additional required methods:

  • getBalance(self, accountType)
  • deposit(self, accountType, amount)
  • withdraw(self, accountType, amount)
    Note: You should raise an error or exception in these 3 methods when a BankUser does not have the account specified by accountType.

Override __str__ to have an informative summary of user's accounts.

Put this class in your Bank.py file.

Write tests to make sure all of the methods for this class work properly. You should also test that your methods properly handle all of the invalid situations for both the BankUser and BankAccount classes.

Submit these tests in a file called P3B_tests.py.

Hint: Use try/except blocks (as seen below) to print out errors/exceptions thrown by the methods, so the entire python script can be run. Feel free to read more here.

In [ ]:
def test_over_withdrawal(): #this test function should throw an
    user = BankUser("Joe");
    user.addAccount(AccountType.SAVINGS);
    user.deposit(AccountType.SAVINGS, 10);
    try:
        user.withdraw(AccountType.SAVINGS, 1000); #this should throw an Exception or Error
    except Exception as e:
        print(e); #print the message for the Exeption
    
test_over_withdrawal();

Part 3: ATM Closure

Finally, we are going to rewrite a closure to use our bank account. We will make use of the input function which takes user input to decide what actions to take.

Write a closure called ATMSession(bankUser) which takes in a BankUser object. It should return a function called Interface that, when called, would provide the following interface:

  1. First prompt for a user will look like:

  1. Pressing 1 will exit, any other option will show a second set of options:

  1. If a deposit or withdraw was chosen, then there must be a third prompt:

  1. Upon finishing a transaction or viewing balance, it should go back to the original prompt.

This is to keep the code relatively simple, if you'd like you can also curate the options depending on the BankUser object (for example, if the user has no accounts then only show the Create Account option), but this is up to you. In any case, you must handle any input from the user in a reasonable way that an actual bank would be okay with, and give the user a proper response to the action specified.

Part 4: Put everything in a module Bank.py

We will be grading this problem with a test suite. Put the enum, classes, and closure in a single file named Bank.py. It is very important that the classes and method specifications we provided are used (with the same capitalization), otherwise you will receive no credit.

Final Deliverables:

  1. Bank.py [30 pts]
  2. P3B_tests.py [10 pts]