Homework 5

Due Thursday, November 7th 2019 at 11:59 PM.

Problem 0. Homework Workflow [10 pts]

Problem 1. @property Decorator [20 points]

Problem 2. Linked List [40 pts]

Problem 3. Binary Search Tree [30 pts]

We will be using an autograder for this homework. Please don't change the specified function signature.

Problem 0: Homework Workflow

Once you receive HW4 feedback, you will need to merge your HW4-dev branch into master.

You will earn points for following all stages of the git workflow which involves:

  • 3pts for merging HW4-dev into master
  • 5pts for completing HW5 on HW5-dev
  • 2pts for making a PR on HW5-dev to merge into master

Problem 1: @property Decorator

Suppose we have a simple Animal class, which is initialized by its species:

class Animal:

    def __init__(self, name, species):
        self.name = name
        self.species = species

    def __repr__(self):
        return f'{self.name} ({self.species})'

We have clients who build applications on top of our Animal class and they want to directly modify the species attribute. For example, they may implement functions to transform animals.

def transform(animal, into):
    animal.species = into
    return animal

However, one day it occurred to us that our Animal class is designed to be used in an H. C. Andersen universe. We really don't want people to make an Animal instance into a Cthulhu [Cthulhu is not part of the H. C. Anderson universe]. How should we enforce this restriction? The obvious solution is to create a method to set the species, which checks if the new species is valid inside the method:

class Animal:

    # a class attribute of the valid species in our universe
    valid_species = {
        'cat',
        'dog',
        'duck',
        'elf',
        'goblin',
        'horse',
        'human',
        'mermaid',
        'nightingale',
        'pig',
        'swan',
        'wolf'
    }

    def __init__(self, species):
        self.species = species

    def __repr__(self):
        return f'{self.name} ({self.species})'

    def set_species(self, into):
        assert into in Animal.valid_species, Exception(f'invalid species: {into}')
        self.species = into

There are two problems with this implementation:

  1. If our clients don't know they have to change their implementation of transform (use set_species instead of directly assign the species attribute), people can still do things like a = transform(ugly_duckling, into='Cthulhu')
  2. We really don't want to bother our clients to change their code because they may have used animal.species = something hundreds of times throughout their code.

Fortunately, we have the property decorator. Read more about it here.

For this problem, you will utilize the property decorator to limit what species users can set their Animal object to. Please copy the following code into the file animal.py.

class Animal:

    # a class attribute of the valid species in our universe
    valid_species = {
        'cat',
        'dog',
        'duck',
        'elf',
        'goblin',
        'horse',
        'human',
        'mermaid',
        'nightingale',
        'pig',
        'swan',
        'wolf'
    }

    def __init__(self, name, species):
        self.name = name
        self._species = species

    def __repr__(self):
        return f'{self.name} ({self._species})'

Part A [10 points]:

We have also now made the species attribute somewhat private by calling it _species because we do not want the user to access it directly. However, this presents a problem because in our first version of the Animal class, a client could access the species through the species attribute. You can check that the attributes of the Animal class are name and _species using the following steps:

>>> dog = Animal('Fido', 'dog')
>>> vars(dog)
{'name': 'Fido',  '_species' : 'dog'}

To prevent the client's previous code from breaking, modify the Animal class in animal.py by adding a species(self) method to the Animal class. The species method should return the _species attribute of this Animal. Apply the property decorator to this method.

If this is done correctly, then a client should still be able to retrieve the species of an Animal object as follows:

>>> dog = Animal('Fido', 'dog')
>>> dog.species
'dog'

Part B [10 points]:

What happens when the client tries to set the value of the species attribute as in the following:

>>> dog.species = 'cat'

You don't need to write the answer to this question, but you should test it.

Thus to prevent the client's previous code from breaking, we need to create another method that:

  1. Allows the client to set the species of an Animal object using .species
  2. And checks that the species attribute is set to a species in our valid_species list.

Add another method species(self, into) to the Animal class. This method should set the _species attribute of this Animal to into, but before that, it makes sure that into is in the valid_species list. If into is not in the valid_species list, raise an Exception. Apply the species.setter decorator to this method.

If this is done correctly, then a client should still be able to set the species of an Animal object (within the restricted list) as follows:

>>> dog = Animal('Fido', 'dog')
>>> dog.species
'dog'
>>> dog.species = 'cat'
>>> dog.species
'cat'
>>> dog.species = 'TheThing'
...
AssertionError: invalid species: TheThing

Do our clients need to change the implementation of their transform function now? Nope, and hopefully they give you a bonus!

Deliverables

  • animal.py

Problem 2: Linked List

A singly linked list can be implemented by nodes. Each node has 2 properties: a head and a tail. The head is the element held by the first node in the list. The tail is the rest of the list, which is a reference to the next node.

The code below is a partially implemented singly linked list. It consists of 2 classes: LinkedList is a node that holds an element and Nil is a special node that indicates the end of the linked list. In this problem, you will finish the implementation of these 2 classes. To name a node LinkedList is a bit confusing, but as we will see, although the LinkedList is implemented as a node, it behaves as if it is a whole list.

We start by copying the following template code into the file linked_list.py.

Hint: Even though this problem seems long, the first 3 parts are solvable with a one-line statement and Part D is only slightly longer (depending your implementation choices). Also, you don't need any loops in this problem (you will not lose points if you do so, but you should find it quite unnatural to use loops here).

class LinkedList:

    def __init__(self, head, tail):
        assert isinstance(tail, LinkedList) or isinstance(tail, Nil), TypeError(
            'tail should either be a LinkedList or a Nil')
        self._head, self._tail = head, tail

    @property
    def head(self):
        return self._head

    @property
    def tail(self):
        return self._tail

    def __str__(self):
        return str(self._head) + ' -> ' + str(self._tail)

    def __repr__(self):
        return f'LinkedList({repr(self._head)}, {repr(self._tail)})'

    def __len__(self):
        return 1 + len(self._tail)

    def __getitem__(self, i):
        return self._head if i == 0 else self._tail[i-1]

    def prepend(self, val):
        pass # TODO

    def append(self, val):
        pass # TODO

    def for_each(self, fun):
        pass # TODO

    def summation(self):
        return self._head + self._tail.summation() if self._tail else self._head

    def minimum(self):
        def smaller(a, b):
            return a if a < b else b
        return smaller(self._head, self._tail.minimum()) if self._tail else self._head

    def reduce_right(self, fun):
        pass # TODO

class Nil():

    def __str__(self): 
        return 'Nil'

    def __repr__(self):
        return 'Nil()'

    def __len__(self):
        return 0

    def __getitem__(self, i):
        raise IndexError('index out of range')

    def __bool__(self): 
        return False

    def prepend(self, val): 
        pass # TODO

    def append(self, val):  
        return LinkedList(val, Nil())

    def for_each(self, fun):
        pass # TODO

Part A [15 points]:

For this problem, we want to make this linked list immutable and we are not updating its head or tail after it's been created.

Please implement the append method. It returns a LinkedList where a node holding the object val is appended.

Hint: It's recursive. We recursively invoke the append on the tail, the tail of the tail, and so on.

There is a price to pay for the immutability. If we want to append an element to this list, we will need to update the tail of the last LinkedList node, which breaks the immutability. Therefore, we need to re-create the last LinkedList node, but then we realize that we need to update the next-to-last LinkedList node by re-creating it as well and so on. It is essentially re-creating the whole new list with one more node. This is not very efficient. Fortunately, instead of appending an element to the end of this list, we can also prepend an element to the start of this list, which will be much more efficient.

Please implement the prepend method in both the LinkedList and Nil classes.

Note: Both append and prepend are returning a new object without mutating the object passed to it (which is our linked list, the self). In this problem, rather than imagining append and prepend as operations done on the original linked list, we imagine append and prepend as pure functions that map some input (original linked list) to an output (a linked list extended by a new node).

Part B [5 points]:

We want to be able to conveniently apply a function to all elements in this list. Please implement the for_each method (in both LinkedList and Nil), which takes a function fun as an argument, and returns a LinkedList (or a Nil) in which the element e in the list is mapped to fun(e). For example:

l = Nil().prepend(1).prepend(2).prepend(3).prepend(4)
def square(x):
    return x**2
print(l)
print(l.for_each(square))

gives

4 -> 3 -> 2 -> 1 -> Nil
16 -> 9 -> 4 -> 1 -> Nil

Note: Again, in the above example, for_each is not modifying the values in the LinkedList l. It is returning a new LinkedList.

Part C [10 points]:

If we look at the logic of summation and minimum, we will notice that they are quite similar. Both are first reducing the tail of the list into an intermediate result and then combining the head with the intermediate result. The difference between summation and minimum lies in what this combining function is.
For summation, two elements are combined by the function

def combine(a, b): 
    return a+b

(We don't really have this function in our code, but its logic is used in summation.) For minimum, two elements are combined by the hypothetical function

def combine(a, b):
    return a if a < b else b

(We don't really have this function in our code, but its logic is used in minimum.) Ideally, we should be able to create any combine function and apply it to the linked list. Please implement the reduce_right method, which requires a function fun as an argument. reduce_right will first reduce the tail of the list into an intermediate result by applying fun to the tail nodes. Then, fun should be applied to the head and the intermediate result to combine them.
For example:

l = Nil().prepend(1).prepend(2).prepend(3).prepend(4)
def smaller(a, b): # our "combine" function
    return a if a < b else b
print(l)
print(l.reduce_right(smaller))

gives

4 -> 3 -> 2 -> 1 -> Nil
1

Part D [10 points]:

For this problem, copy the following into longest_sentence.py:

def get_list_of_sentences(chapter1='swansway-chapter1.txt'):
    def to_sentences(p):
            for delimiter in '.?!': p = p.replace(delimiter, '|')
            return [s.strip('\" ') for s in p.split('|')]
    with open(chapter1, 'r', encoding='UTF-8') as f:
        paragraphs = f.readlines()

    sentences = [s for p in paragraphs for s in to_sentences(p) if len(s) > 1]
    list_of_sentences = Nil()
    for s in sentences[::-1]:
        list_of_sentences = list_of_sentences.prepend(s)
    return list_of_sentences


def longest_sentence():
    list_of_sentences = get_list_of_sentences()

    pass # TODO

Marcel Proust is known for writing long sentences. The swansway-chapter1.txt contains the first chapter of Remembrance of Things Past. The function get_list_of_sentences will preprocess this file and return a LinkedList of all sentences in this chapter. Your task is to implement longest_sentence which returns the number of words in the longest sentence. You can assume that anything delimited by a space is a word.

Note: get_list_of_sentences reads text from swansway-chapter1.txt. You should have the text file in your HW5-final/ directory.

Hint: Use the for_each and reduce_right methods that you implemented.

Deliverables

  • linked_list.py
  • longest_sentence.py

Problem 3: Binary Search Tree

A binary search tree (BST) is a binary tree in which:

  • The left subtree of a node contains only nodes with keys less than the node’s key.
  • The right subtree of a node contains only nodes with keys greater than the node’s key.
  • The left and right subtree each must also be a binary search tree.

The code below is a partially implemented binary search tree. BSTNode is a node in the binary search tree. It contains a key-value pair and has references to its child nodes. BSTTable is the binary search tree itself. It holds a reference to the root node of the tree and contains methods that manipulate the nodes. A BSTTable can be used as a table to store key-value pairs. Our implementation will only be able to handle the case when the keys are comparable to each other. Simple client code of BSTTable looks like this:

greektoroman = BSTTable()
greektoroman.put('Athena',    'Minerva')
greektoroman.put('Eros',      'Cupid')
greektoroman.put('Aphrodite', 'Venus')
greektoroman.get('Eros')

and will return

'Cupid'

In this problem, we will implement the get and put method of BSTTable.

We start by copying the following template code into the file binary_search_tree.py.

Hint: You can print a BSTTable or BSTNode to see its structure. It's not pretty, but should be enough for debugging purposes.

Note: You may assume that the keys are unique.

class BSTNode:

    def __init__(self, key, val):
        self.key, self.val = key, val
        self.left = None
        self.right = None
        self.size = 1

    def __str__(self):
        return f'BSTNode({self.key}, {self.val})' + \
               '\n|\n|-(L)->' + '\n|      '.join(str(self.left ).split('\n')) + \
               '\n|\n|-(R)->' + '\n|      '.join(str(self.right).split('\n'))


class BSTTable:
    def __init__(self):
        self._root = None

    def __str__(self):
        return str(self._root)

    def __len__(self):
        return self._size(self._root)

    def put(self, key, val):
        self._root = self._put(self._root, key, val)

    def get(self, key):
        return self._get(self._root, key)

    def _put(self, node, key, val):
        pass # TODO

    def _get(self, node, key):
        pass # TODO

    @staticmethod
    def _size(node):
        return node.size if node else 0

Part A [15 points]:

The put method inserts a key-value pair into the BSTTable. It delegates the function call to _put. Please implement the 'private' method _put. It takes a node, a key, and a val as arguments. node is the root of the subtree into which the key-value pair is to be inserted into. After the insertion, the root of this subtree may change. _put should return the new root of this subtree. The difference of the input subtree and the returned subtree is that in the returned subtree, a new key-value pair has been inserted. If the key was already in the subtree, then we update the value of the corresponding node.

Note: We assume all the keys are of the same type and are comparable to each other with >, < and ==. So if all your keys are strings, they will be ordered lexicographically. If all your keys are numbers, they will be ordered from the smallest to the largest.

Hint: This is recursive. Think about the base case: what should be returned if the node passed in is None? Which subtree (left/right) should we insert if the node is not None? Remember to update the size of the nodes.

Part B [15 points]:

The get method returns the value with the corresponding key. It delegates the function call to _get. Please implement the 'private' method _get. It takes a node and a key as an argument. It will search for a node with the same key under the subtree whose root is node. If there is such a node, return the val of this node. _get should raise a KeyError if no such node is found.

Deliverables

  • binary_search_tree.py