Homework 6

Due Thursday, November 21st 2019 at 11:59 PM.

Be sure to push the final version of your code to your GitHub repo. Follow the instructions on the course website.

Topics

Problem 0. Homework Workflow [10 pts]

  • Prerequisite

Problem 1: BST Extensions [30 pts]

  • Part A. Remove minimum node (_removemin)
  • Part B. Remove node (_remove)
  • Part C. BST Traversal

Problem 2: Heaps [30 pts]

Problem 3: Priority Queue [30 pts]

  • Part A. Naive priority queue
  • Part B. Priority queue based on min-heap
  • Part C. Compare two implementations of a priority queue

Note: We will be using autograders for the problems below so make sure you conform with the interfaces as described in the question statement.

Note: This homework may look very lengthy, but this is because we provide substantial code to you to help you start the problems.

Problem 0: Homework Workflow

Once you receive HW5 feedback, you will need to merge your HW5-dev branch into master.

You will earn points for following all stages of the git workflow which involves:

  • 3pts for merging HW5-dev into master
  • 5pts for completing HW6 on HW6-dev
  • 2pts for making a PR on HW6-dev to merge into master

Prerequisite Questions

We would like your feedback on the AD GUI tool that you used in Homework 4. Follow this link and answer the questions.

Problem 1 [30 pts]: BST Extensions

This problem builds on Problem 3 of Homework 5 in which you wrote a binary search tree. This time, you will first write some more supporting functions for your BST and then write a class to perform traversals on it.

Put your code for this problem in a file titled P1.py. Failure to follow this instruction will result in point deduction.

class BSTNode:

    def __init__(self, key, val):
        self.key, self.val = key, val
        self.left, self.right = None, None
        self.size = 1

    def __str__(self):
        return f'BSTNode({self.key}, {self.val})' + \
               '\n|\n|-(L)->' + '\n|      '.join(str(self.left ).split('\n')) + \
               '\n|\n|-(R)->' + '\n|      '.join(str(self.right).split('\n'))


class BSTTable:
    def __init__(self):
        self._root = None

    def __str__(self):
        return str(self._root)

    def __len__(self):
        return self._size(self._root)

    def put(self, key, val):
        self._root = self._put(self._root, key, val)

    def get(self, key):
        return self._get(self._root, key)

    def _put(self, node, key, val):
        if not node: 
            return BSTNode(key, val)
        if   key < node.key:
            node.left  = self._put(node.left,  key, val)
        elif key > node.key:
            node.right = self._put(node.right, key, val)
        else:
            node.val = val
        node.size = 1 + self._size(node.left) + self._size(node.right)
        return node

    def _get(self, node, key):
        if not node:
            raise KeyError(f'key not found: {key}')
        if   key < node.key:
            return self._get(node.left,  key)
        elif key > node.key:
            return self._get(node.right, key)
        else:
            return node.val

    @staticmethod
    def _size(node):
        return node.size if node else 0

Part A [10 pts]: Remove minimum node (_removemin)

Your task: Please implement the method _removemin in your BSTTable class. It takes a node as an argument. It will return the subtree with the smallest node (the node with the smallest key) removed.

In Part B of this problem, you will see that the _removemin function will help us reduce the complexity of a related function called _remove.

Hint: Remember to update the size of the nodes.

Hint: check out these slides, especially the BST node removal

Demo

t = BSTTable()
t.put(5, 'a')
t.put(1, 'b')
t.put(2, 'c')
t.put(0, 'd')
print(t._root)

BSTNode(5, a)
|
|-(L)->BSTNode(1, b)
|      |
|      |-(L)->BSTNode(0, d)
|      |      |
|      |      |-(L)->None
|      |      |
|      |      |-(R)->None
|      |
|      |-(R)->BSTNode(2, c)
|      |      |
|      |      |-(L)->None
|      |      |
|      |      |-(R)->None
|
|-(R)->None

print(t._removemin(t._root))

BSTNode(5, a)
|
|-(L)->BSTNode(1, b)
|      |
|      |-(L)->None
|      |
|      |-(R)->BSTNode(2, c)
|      |      |
|      |      |-(L)->None
|      |      |
|      |      |-(R)->None
|
|-(R)->None

Part B [10 pts]: Remove Node

The remove method removes the node with the corresponding key. It delegates the function calls to _remove, which takes a node and a key as an argument and returns the subtree without the node whose key is key.

Your task: Add and implement the following to the BSTTable class:

def remove(self, key):
    self._root = self._remove(self._root, key)

def _remove(self, node, key)
    # TODO: Should return a subtree whose root is  but without
    #       the node whose key is 
    pass

The method should raise a KeyError if no node whose key is key is found.

Hints:

  • As in _removemin, remember to update the size of the nodes.
  • Remember that after the removal of a node, the returned subtree should still be a binary search tree.
  • Your _remove should be recursive. Think about which situations you should be able to handle.
    • What if the current node's key is greater/smaller than the key we want to remove?
    • What if we reach the node we are looking for? How do you make sure the subtree you return is still a binary tree?
    • How can you use _removemin to make your life a little easier? Note: you aren't required to use _removemin here but it can reduce the amount of code duplication

Demo

t = BSTTable()
t.put(5, 'a')
t.put(1, 'b')
t.put(2, 'c')
t.put(0, 'd')
print(t._remove(t._root, 5))

BSTNode(1, b)
|
|-(L)->BSTNode(0, d)
|      |
|      |-(L)->None
|      |
|      |-(R)->None
|
|-(R)->BSTNode(2, c)
|      |
|      |-(L)->None
|      |
|      |-(R)->None


print(t._remove(t._remove(t._root, 5), 1))

BSTNode(2, c)
|
|-(L)->BSTNode(0, d)
|      |
|      |-(L)->None
|      |
|      |-(R)->None
|
|-(R)->None

What should this return?

print(t._remove(t._root, 10))

Part C [10 pts]: BST Traversal

During lectures we discussed three types of depth-first traversal (DFS): preorder, inorder, and postorder. In addition to the lecture notes, here is another reference: Tree Traversal.

Your task: Write an iterator class called DFSTraversal that allows us to traverse a BST according to a user-specified order.

The DFSTraversal class should follow following specifications:

  • The constructor takes a BSTTable object called tree and one of the enums from DFSTraversalTypes (see skeleton below).
    • Note: You should be using the enums and not regular integers 1, 2, or 3.
  • __iter__(self): This is the initialization of an iterator
  • __next__(self): This is called in the iterator for getting the next BSTNode in the tree. This dunder method should raise a StopIteration error when the entire BST has been traversed.
  • inorder, preorder, postorder: these will perform the actual traversal

Hint: inorder, preorder, postorder should not return anything; they perform traversal of your tree. To figure out the action they perform look at the demo below

Demo of DFSTraversal

A use case of the DFSTraversal class:

input_array = [(4, 'a'), (9, 'c'), (2, 'f'), (3, 'z'), (11, 'i'), (8, 'r')]
bst = BSTTable()
for key, val in input_array:
    bst.put(key, val)
traversal = DFSTraversal(bst, DFSTraversalTypes.INORDER)
for node in traversal:
    print(node.key + ', ' + node.val)

The above should print:

2, f
3, z
4, a
8, r
9, c
11, i

Use the class skeleton below for your traversal implementation.

Note: Put your DFSTraversal class in the same file as the previous problems (P1.py)

from enum import Enum

class DFSTraversalTypes(Enum):
    PREORDER = 1
    INORDER = 2
    POSTORDER = 3

class DFSTraversal():
    def __init__(self, tree: BSTTable, traversalType: DFSTraversalTypes):
        # TODO: implement
        pass

    def __iter__(self):
        # TODO: implement
        pass

    def __next__(self):
        # TODO: implement
        pass

    def inorder(self, bst: BSTTable):
        # TODO: implement
        return

    def preorder(self, bst: BSTTable):
        # TODO: implement
        return

    def postorder(self, bst: BSTTable):
        # TODO: implement
        return

Deliverables

  • P1.py: should contain implementations for BSTTable, BSTNode and DFSTraversal classes. BSTTable should now have _removemin and _remove methods.

Problem 2 [30 pts]: Heaps

This question is all about the binary heap data structure. A (binary) heap is a (nearly) complete binary tree; this means all of it's non-leaf nodes have exactly two children which makes it suitable to store in a compact and memory efficient array. Each node of the tree is an element in the array.

The two types of binary heaps are: min-heap and max-heap. The heap-property is the most important invariant that holds at every node in the tree and is dependent on the type of heap we consider. In a min-heap the heap-property states:

  • In an array-backed min-heap, $A$, the following relation holds for every node (other than the root), $A[i]$: $A[Parent(i)] \leq A[i]$
  • the root is the minimum element of the tree

For a max-heap, the above comparison between parent ($A[Parent(i)]$) and child node ($A[i]$) is reversed and the root represents the tree's maximum element.

Your objective by the end of the question is to implement a binary max- and min-heap that support the following operations:

  • heapify: maintains the max/min-heap property
  • build_heap: turns an unordered array into a valid max/min-heap
  • heappush: inserts an element into the heap
  • heappop: removes and returns the heap's minimum/maximum element

Hint: Play around with this Heap visualization tool to get an intuition of its inner workings.

Hint: For an overview of operations on Heaps see these Heap lectures.

Hint: See the Wikipedia Binary Heap Page for useful guidance on potential solutions.

Hint: your build_heap output does not have to be consistent with our demo printouts since heap element arrangements are not unique. Your heap should simply support the heap-property at every node but can yield a number of different permutations of elements at every level (which on an interesting side-note can be computed using Heap's Algorithm named after B. R. Heap)

Part A [10 pts] : Building a min-heap

Your task: implement the heapify and build_heap functions for a min-heap.

Hint: By the end of the problem, we want you to have both a min- and max-heap. To avoid code duplication, notice that the only function dependent on the heap-type are the ones that need to preserve the heap-property (i.e., whenever we compare parent and child node). You will be reminded of this later on in the problem.

Copy the following Heap skeleton into a file called P2.py and complete the implementation:

from math import floor
from typing import List

class Heap:
    def __init__(self, array: List[int]) -> None:
        self.elements = array
        self.size = len(array) # Number of elements in heap
        self.build_heap()

    # index of left child of the node at idx
    def left(self, idx: int) -> int:
        return 2 * idx + 1

    # index of right child of the node at idx
    def right(self, idx: int) -> int:
        return 2 * idx + 2

    def parent(self, idx: int) -> int:
        return floor((idx - 1) / 2)

    def swap(self, idx1: int, idx2: int) -> None:
        tmp = self.elements[idx1]
        self.elements[idx1] = self.elements[idx2]
        self.elements[idx2] = tmp

    def to_string(self, prefix: str = "", idx: int = 0, left: bool = False) -> str:
        if self.size == 0:
            return '\\--'
        elif idx < self.size:
            buf = prefix
            if left:
                buf = buf + "|-- "
            else:
                buf = buf + "\\-- "
            buf = buf + '\033[1m' + str(self.elements[idx]) + '\033[0m' + '\n'

            new_prefix = prefix
            if left:
                new_prefix = new_prefix + "|   "
            else:
                new_prefix = new_prefix + "    "

            return buf + \
                    self.to_string(new_prefix, self.left(idx), True) + \
                    self.to_string(new_prefix, self.right(idx), False)
        else:
            return ''

    def __str__(self) -> str:
        return self.to_string()

    def __len__(self) -> str:
        return self.size

    def heapify(self, idx: int) -> None:
        # TODO: implement
        pass

    def build_heap(self) -> None:
        # TODO: implement
        pass

Demo of building a heap

h = Heap([-1,0,0,15,23,1,2,3]) # The heap tree will be built during initialization
print(h)

This should output a valid permutation of following graph:

\-- -1
    |-- 0
    |   |-- 3
    |   |   |-- 15
    |   \-- 23
    \-- 0
        |-- 1
        \-- 2

Part B: Modifying a heap

Now that you can build a min-heap, we want to add some more useful functionality to it such as inserting and removing elements.

Your task: implement the methods heappush and heappop for your Heap class.

  • heappush: inserts a new element into the heap (while maintining its heap-property!)
  • heappop:
    • this function should remove the heap's minimum element and return it to the caller
    • Raise an IndexError when trying to pop from an empty heap

Add the following methods to your Heap class from Part A and finish their implementation:

def heappush(self, key: int) -> None:
    # TODO: implement
    pass

def heappop(self) -> int:
    # TODO: implement
    pass

Part C: A Wild Max-Heap Appears

As hinted in Part A of this question most of the logic for building and modifying a binary heap is the same between min-heaps and max-heaps. The main difference between the two is the heap-property that has to hold for all nodes.

Your task: Rewrite your Heap class from Part A and Part B and create two classes, MinHeap and MaxHeap, according to the code skeleton below.

Notes

  • MinHeap and MaxHeap should be subclasses that implement the compare() method.
  • The compare() method should be used in your heapify and heappush methods to maintain the heap-property for the particular type of heap. The function should compare two node values and return True if the comparison VIOLATES the heap-property for your particular heap.
from math import floor
from typing import List

class Heap:
    def __init__(self, array: List[int]) -> None:
        self.elements = array
        self.size = len(array) # Number of elements in heap
        self.build_heap()

    # index of left child of the node at idx
    def left(self, idx: int) -> int:
        return 2 * idx + 1

    # index of right child of the node at idx
    def right(self, idx: int) -> int:
        return 2 * idx + 2

    def parent(self, idx: int) -> int:
        return floor((idx - 1) / 2)

    def swap(self, idx1: int, idx2: int) -> None:
        tmp = self.elements[idx1]
        self.elements[idx1] = self.elements[idx2]
        self.elements[idx2] = tmp

    def to_string(self, prefix: str = "", idx: int = 0, left: bool = False) -> str:
        if self.size == 0:
            return '\\--'
        elif idx < self.size:
            buf = prefix
            if left:
                buf = buf + "|-- "
            else:
                buf = buf + "\\-- "
            buf = buf + '\033[1m' + str(self.elements[idx]) + '\033[0m' + '\n'

            new_prefix = prefix
            if left:
                new_prefix = new_prefix + "|   "
            else:
                new_prefix = new_prefix + "    "

            return buf + \
                    self.to_string(new_prefix, self.left(idx), True) + \
                    self.to_string(new_prefix, self.right(idx), False)
        else:
            return ''

    def __str__(self) -> str:
        return self.to_string()

    def __len__(self) -> int:
        return self.size

    # TODO: override this in your dervied classes
    def compare(self, a: int, b: int) -> bool:
        raise NotImplementedError

    def heapify(self, idx: int) -> None:
        # TODO: your solution from the previous question can go here
        #       but remember to use your new "self.compare(a, b)" instead
        #       of raw comparisons
        pass

    def build_heap(self) -> None:
        # TODO: your solution from the previous question can go here
        pass

    def heappush(self, key: int) -> None:
        # TODO: your solution from the previous question can go here
        #       but remember to use your new "self.compare(a, b)" instead
        #       of raw comparisons
        pass

    def heappop(self) -> int:
        # TODO: your solution from the previous question can go here
        pass

class MinHeap(Heap):
    # TODO: complete implementation
    pass

class MaxHeap(Heap):
    # TODO: complete implementation
    pass

Demo: max- and min-heap classes

elements = [1,2,3,4,5]
mn = MinHeap(elements)
mx = MaxHeap(elements)

print(mn)
print(mx)

\-- 1
    |-- 2
    |   |-- 4
    |   \-- 5
    \-- 3

\-- 5
    |-- 4
    |   |-- 1
    |   \-- 2
    \-- 3

Deliverables

  • P2.py: Should contain the Heap, MinHeap and MaxHeap classes. The Heap class should be implemented as a base class as described in Part 2C.

Problem 3 [30 pts]: Priority Queue

A priority queue is an abstract data type that represents a set of elements where each element has some key associated with it (a priority). Dequeueing (i.e., removing) an element from the priority queue will return the element with highest (in a max-priority queue) or lowest (in a min-priority queue) key. Thus removing all elements of the priority queue will leave you with a sorted list. A priority queue gets its name from the fact that elements are inserted or removed with priority over others.

Our priority queue should support following three operations:

  1. put: add an element to the set
  2. get: retrieve and remove the element with the largest key (or smallest depending on the type of priority queue)
  3. peek: this should return the largest element (or smallest depending on the type of priority queue)

In our priority queue implementation, the key of an element will be the element itself. In a min-priority queue, an element, X, has a higher priority compared to element, Y, if X < Y. Examples:

  • 1 < 3 is True, so we assume 1 has higher priority
  • 'beef pho' < 'chicken pho' is True (Python compares strings with lexicographical order), so we assume 'beef pho' has higher priority
  • (1685, 'pho') < (1862, 'rice') is True (Python compares tuples position by position), so we assume (1685, 'pho') has higher priority.

Note: in a max-priority queue you would compare elements using > to compute higher priority.

In this problem you will be asked to implement three different versions of a min-priority queue.

Copy below code template into a file called P3.py.

from random import sample
from time import time

class PriorityQueue:
    def __init__(self, max_size):
        self.elements = []
        self.max_size = max_size

    def __len__(self):
        return len(self.elements)

    def __bool__(self):
        return len(self.elements) > 0

    def put(self, val):
        raise NotImplementedError # TODO

    def get(self):
        raise NotImplementedError # TODO

    def peek(self):
        raise NotImplementedError # TODO

def mergesortedlists(lists, pqclass=PriorityQueue):
    merged = []
    pq = pqclass(len(lists))
    for i, l in enumerate(lists): 
        pq.put((l.pop(0), i))

    while pq:
        ele, i = pq.get()
        merged.append(ele)
        if lists[i]:
            pq.put((lists[i].pop(0), i)) 

    return merged


def generatelists(n, length=20, dictionary_path='../data/words.txt'):
    with open(dictionary_path, 'r') as f:
        words = [w.strip() for w in f.readlines()]
    lists = []
    for _ in range(n):
        lists.append(sorted(sample(words, length)))
    return lists


def timeit(ns=(10, 20, 50, 100, 200, 500), pqclass=PriorityQueue, n_average=5):
    elapsed = []
    for n in ns:
        timeaccum = 0
        for _ in range(n_average):
            lists = generatelists(n)
            start = time()
            merged = mergesortedlists(lists, pqclass)
            end   = time()
            timeaccum += end-start
        elapsed.append(timeaccum / n_average)
    return elapsed

Part A [10 points]:

A naive way to implement a priority queue is to use a plain old list. Inserting an element appends to the list and extracting an element scans the whole list before returning (and removing) the highest priority element (remember what highest priority means in our min-priority queue).

Your task: implement a min-priority queue class called NaivePriorityQueue

Put all your code in a file called P3.py

Your class NaivePriorityQueue should inherit the base class PriorityQueue.

PriorityQueue will be initialized with a max_size which indicates the capacity (maximum number of elements allowed) of the priority queue.

Your NaivePriorityQueue class needs to implement the methods:

  1. put should take in a value val and insert it at the end of the elements list. Nothing should be returned.
  2. get should remove the smallest value from elements and return it.
  3. peek should return the smallest value in the queue.

Raise an IndexError in the following three cases:

  • put called on a full priority queue
  • get called on an empty priority queue
  • peek called on an empty priority queue

Demo

q = NaivePriorityQueue(2)
q.put(1)
q.put(2)
print(q.peek())

1

print(q.get())
1
print(q.get())
2

What should this final statement return?

print(q.get())

Part B [10 points]: Heap-backed priority queue

A better way to implement a priority queue is to use a binary heap.

Your task: Implement a min-priority queue using a min-heap. You will implement two versions: (1) using your own heap from question 2 (2) using the heap from Python's standard library

With the heappush and heappop methods from the Heap class, we can effectively treat a Python list as a heap. If we are using a heap to implement a priority queue:

  • put should push the element to the heap (heappush)
  • get should remove and return the root element of the heap (heappop)
  • peek should return the smallest element of your min-heap (largest element in a max-heap)

Please implement two priority queues called HeapPriorityQueue and PythonHeapPriorityQueue that use the your own heap implementation and Python's built-in heap respectively. As in Part A please use PriorityQueue as your base class in the same file P3.py.

Raise IndexErrors as you did in the previous problem where necessary.

Part C [10 points]:

Do this problem in a separate Python file called: P3-C.py.

In this problem, we will motivate the use of a priority queue and make you investigate the impact of the underlying implementation choice.

Suppose we want to merge multiple sorted lists into one large sorted list. One way to do it is to maintain a priority queue which holds the head element in each list. We then constantly pop from this queue (to append to the target list) and insert the next element of the corresponding list into the queue. The target list will be sorted because we are always first appending the smallest head element of all lists (given that the lists are sorted, this is the smallest element not merged).

We will use the problem of merging lists using priority queues as our benchmark to compare your three priority queue implementations from the previous problems.

Your task: compare the performance of NaivePriorityQueue, HeapPriorityQueue and PythonHeapPriorityQueue by timing the mergedsortedlists method using the timeit method (both of which were provided in the PriorityQueue skeleton)

The mergesortedlists function takes a priority queue class as an argument (pqclass), and performs the merging using that priority queue implementation. Here we test the performance of mergesortedlists with lists that contains sorted words from the words.txt file provided. In your final solution, words.txt can be at the same directory level as your solution file.

The timeit function takes a priority queue class as an argument (pqclass) and measures the running time of mergesortedlists for that priority queue. Another argument for timeit is ns, which is an ascending sequence of the numbers of lists to merge (this will be the x-axis for your performance plot described below).

Make a line plot to compare the running time of mergesortedlists with NaivePriorityQueue, HeapPriorityQueue and PythonHeapPriorityQueue for an increasing number of lists. In the plot, you should have three curves: one for each queue implementation. You should plot the elapsed time against number of lists (the sequence argument, ns, passed to timeit).

Graph format:

  • Don't use logarithmic scales
  • Label your x- and y-axes
  • Create a descriptive title
  • x-axis: Number of Lists Merged (ranges from 0 to 500; see timeit's ns parameter)
  • y-axis: Elapsed time in seconds

Save the plot as P3-C.png.

Note: generatelists reads words from words.txt. You should have that in your HW6-final folder.

Deliverables

  • P3.py: should contain NaivePriorityQueue, HeapPriorityQueue and PythonHeapPriorityQueue
  • P3-C.py: should contain the code for Part C
  • P3-C.png: should contain a plot comparing the two priority queue runtimes