Homework 6¶
Due Tuesday, November 24th 2020 at 11:59 PM.¶
Be sure to push the final version of your code to your GitHub repo. Follow the instructions on the course website.¶
Topics¶
Problem 0. Homework Workflow [10 pts]¶
- Prerequisite
Problem 1: BST Extensions [30 pts]¶
Problem 2: Heaps [30 pts]¶
Problem 3: Priority Queue [30 pts]¶
- Part A. Naive priority queue
- Part B. Priority queue based on min-heap
- Part C. Compare two implementations of a priority queue
Note: We will be using autograders for the problems below so make sure you conform with the interfaces as described in the question statement. DO NOT change any of the given code (i.e. class names, methods names, etc.). However, you are free to implement any helper methods as needed.
Note: This homework may look very lengthy, but this is because we provide substantial code to you to help you start the problems.
Problem 0: Homework Workflow¶
Once you receive HW5 feedback, you will need to merge your HW5-dev
branch into master
.
You will earn points for following all stages of the git workflow which involves:
- 3pts for merging
HW5-dev
intomaster
- 5pts for completing HW6 on
HW6-dev
- 2pts for making a PR on
HW6-dev
to merge intomaster
Prerequisite Questions¶
We would like your feedback on the Auto-eD tool that you used in Homework 4. Follow this link and answer the questions.
Problem 1 [30 pts]: BST Extensions¶
This problem builds on Problem 3 of Homework 5 in which you wrote a binary search tree. This time, you will first write some more supporting functions for your BST and then write a class to perform traversals on it.
Put the following code and your edits for this problem in a file titled P1.py
. Failure to follow this instruction will result in point deduction.
class BSTNode:
def __init__(self, key, val):
self.key, self.val = key, val
self.left, self.right = None, None
self.size = 1
def __str__(self):
return f'BSTNode({self.key}, {self.val})' + \
'\n|\n|-(L)->' + '\n| '.join(str(self.left ).split('\n')) + \
'\n|\n|-(R)->' + '\n| '.join(str(self.right).split('\n'))
class BSTTable:
def __init__(self):
self._root = None
def __str__(self):
return str(self._root)
def __len__(self):
return self._size(self._root)
def put(self, key, val):
self._root = self._put(self._root, key, val)
def get(self, key):
return self._get(self._root, key)
def _put(self, node, key, val):
if not node:
return BSTNode(key, val)
if key < node.key:
node.left = self._put(node.left, key, val)
elif key > node.key:
node.right = self._put(node.right, key, val)
else:
node.val = val
node.size = 1 + self._size(node.left) + self._size(node.right)
return node
def _get(self, node, key):
if not node:
raise KeyError(f'key not found: {key}')
if key < node.key:
return self._get(node.left, key)
elif key > node.key:
return self._get(node.right, key)
else:
return node.val
@staticmethod
def _size(node):
return node.size if node else 0
Part A [10 pts]: Remove minimum node (_removemin
)¶
Your task: Please implement the method _removemin
in your BSTTable
class. It takes a node
as an argument. It will return the subtree with the smallest node (the node with the smallest key) removed.
In Part B of this problem, you will see that the _removemin
function will help us reduce the complexity of a related function called _remove
.
Hint: Remember to update the size
of the nodes.
Hint: Remember that the _removemin
function should modify the tree.
Hint: check out these slides, especially the BST node removal
Demo¶
t = BSTTable()
t.put(5, 'a')
t.put(1, 'b')
t.put(2, 'c')
t.put(0, 'd')
print(t._root)
BSTNode(5, a)
|
|-(L)->BSTNode(1, b)
| |
| |-(L)->BSTNode(0, d)
| | |
| | |-(L)->None
| | |
| | |-(R)->None
| |
| |-(R)->BSTNode(2, c)
| | |
| | |-(L)->None
| | |
| | |-(R)->None
|
|-(R)->None
print(t._removemin(t._root))
BSTNode(5, a)
|
|-(L)->BSTNode(1, b)
| |
| |-(L)->None
| |
| |-(R)->BSTNode(2, c)
| | |
| | |-(L)->None
| | |
| | |-(R)->None
|
|-(R)->None
Part B [10 pts]: Remove Node¶
The remove
method removes the node with the corresponding key. It delegates the function calls to _remove
, which takes a node
and a key
as an argument and returns the subtree without the node whose key is key
.
Your task: Add and implement the following to the BSTTable
class:
def remove(self, key):
self._root = self._remove(self._root, key)
def _remove(self, node, key)
# TODO: Should return a subtree whose root is <node> but without
# the node whose key is <key>
pass
The method should raise a KeyError
if no node whose key is key
is found.
Hints:
- As in
_removemin
, remember to update thesize
of the nodes. - Remember that after the removal of a node, the returned subtree should still be a binary search tree.
- Your
_remove
should be recursive. Think about which situations you should be able to handle.- What if the current node's key is greater/smaller than the key we want to remove?
- What if we reach the node we are looking for? How do you make sure the subtree you return is still a binary tree?
- How can you use
_removemin
to make your life a little easier? Note: you aren't required to use_removemin
here but it can reduce the amount of code duplication
Demo¶
t = BSTTable()
t.put(5, 'a')
t.put(1, 'b')
t.put(2, 'c')
t.put(0, 'd')
print(t._remove(t._root, 5))
BSTNode(1, b)
|
|-(L)->BSTNode(0, d)
| |
| |-(L)->None
| |
| |-(R)->None
|
|-(R)->BSTNode(2, c)
| |
| |-(L)->None
| |
| |-(R)->None
print(t._remove(t._remove(t._root, 5), 1))
BSTNode(2, c)
|
|-(L)->BSTNode(0, d)
| |
| |-(L)->None
| |
| |-(R)->None
|
|-(R)->None
What should this return?
print(t._remove(t._root, 10))
Part C [10 pts]: BST Traversal¶
During lectures we discussed three types of depth-first traversal (DFS): preorder, inorder, and postorder. In addition to the lecture notes, here is another reference: Tree Traversal.
Your task: Write an iterator class called DFSTraversal
that allows us to traverse a BST according to a user-specified order.
The DFSTraversal
class should follow following specifications:
- The constructor takes a
BSTTable
object calledtree
and one of the enums fromDFSTraversalTypes
(see skeleton below).- Note: You should be using the enums and not regular integers
1
,2
, or3
.
- Note: You should be using the enums and not regular integers
__iter__(self)
: This is the initialization of an iterator__next__(self)
: This is called in the iterator for getting the nextBSTNode
in the tree. This dunder method should raise aStopIteration
error when the entire BST has been traversed.inorder
,preorder
,postorder
: these will perform the actual traversal
Hint: inorder
, preorder
, postorder
should not return anything; they perform traversal of your tree. To figure out the action they perform look at the demo below
Demo of DFSTraversal
¶
A use case of the DFSTraversal
class:
input_array = [(4, 'a'), (9, 'c'), (2, 'f'), (3, 'z'), (11, 'i'), (8, 'r')]
bst = BSTTable()
for key, val in input_array:
bst.put(key, val)
traversal = DFSTraversal(bst, DFSTraversalTypes.INORDER)
for node in traversal:
print(str(node.key) + ', ' + node.val)
The above should print:
2, f
3, z
4, a
8, r
9, c
11, i
Use the class skeleton below for your traversal implementation.
Note: Put your DFSTraversal
class in the same file as the previous problems (P1.py
)
from enum import Enum
class DFSTraversalTypes(Enum):
PREORDER = 1
INORDER = 2
POSTORDER = 3
class DFSTraversal():
def __init__(self, tree: BSTTable, traversalType: DFSTraversalTypes):
# TODO: implement
pass
def __iter__(self):
# TODO: implement
pass
def __next__(self):
# TODO: implement
pass
def inorder(self, bst: BSTTable):
# TODO: implement
return
def preorder(self, bst: BSTTable):
# TODO: implement
return
def postorder(self, bst: BSTTable):
# TODO: implement
return
Deliverables¶
P1.py
: should contain implementations forBSTTable
,BSTNode
andDFSTraversal
classes.BSTTable
should now have_removemin
and_remove
methods.
Problem 2 [30 pts]: Heaps¶
This question is all about the binary heap data structure. A (binary) heap is a (nearly) complete binary tree; this means all of it's non-leaf nodes have exactly two children which makes it suitable to store in a compact and memory efficient array. Each node of the tree is an element in the array.
The two types of binary heaps are: min-heap and max-heap. The heap-property is the most important invariant that holds at every node in the tree and is dependent on the type of heap we consider. In a min-heap the heap-property states:
- In an array-backed min-heap, $A$, the following relation holds for every node (other than the root), $A[i]$: $A[Parent(i)] \leq A[i]$
- the root is the minimum element of the tree
For a max-heap, the above comparison between parent ($A[Parent(i)]$) and child node ($A[i]$) is reversed and the root represents the tree's maximum element.
Your objective by the end of the question is to implement a binary max- and min-heap that support the following operations:
heapify
: maintains the max/min-heap propertybuild_heap
: turns an unordered array into a valid max/min-heapheappush
: inserts an element into the heapheappop
: removes and returns the heap's minimum/maximum element
Hint: Play around with this Heap visualization tool to get an intuition of its inner workings.
Hint: For an overview of operations on Heaps see these Heap lectures.
Hint: See the Wikipedia Binary Heap Page for useful guidance on potential solutions.
Hint: your build_heap
output does not have to be consistent with our demo printouts since heap element arrangements are not unique. Your heap should simply support the heap-property at every node but can yield a number of different permutations of elements at every level (which on an interesting side-note can be computed using Heap's Algorithm named after B. R. Heap)
Part A [10 pts] : Building a min-heap¶
Your task: implement the heapify
and build_heap
functions for a min-heap.
Hint: By the end of the problem, we want you to have both a min- and max-heap. To avoid code duplication, notice that the only function dependent on the heap-type are the ones that need to preserve the heap-property (i.e., whenever we compare parent and child node). You will be reminded of this later on in the problem.
Copy the following Heap
skeleton into a file called P2.py and complete the implementation:
from math import floor
from typing import List
class Heap:
def __init__(self, array: List[int]) -> None:
self.elements = array
self.size = len(array) # Number of elements in heap
self.build_heap()
# index of left child of the node at idx
def left(self, idx: int) -> int:
return 2 * idx + 1
# index of right child of the node at idx
def right(self, idx: int) -> int:
return 2 * idx + 2
def parent(self, idx: int) -> int:
return floor((idx - 1) / 2)
def swap(self, idx1: int, idx2: int) -> None:
tmp = self.elements[idx1]
self.elements[idx1] = self.elements[idx2]
self.elements[idx2] = tmp
def to_string(self, prefix: str = "", idx: int = 0, left: bool = False) -> str:
if self.size == 0:
return '\\--'
elif idx < self.size:
buf = prefix
if left:
buf = buf + "|-- "
else:
buf = buf + "\\-- "
buf = buf + '\033[1m' + str(self.elements[idx]) + '\033[0m' + '\n'
#buf = buf + str(self.elements[idx]) + '\n' #use if above doesn't work
new_prefix = prefix
if left:
new_prefix = new_prefix + "| "
else:
new_prefix = new_prefix + " "
return buf + \
self.to_string(new_prefix, self.left(idx), True) + \
self.to_string(new_prefix, self.right(idx), False)
else:
return ''
def __str__(self) -> str:
return self.to_string()
def __len__(self) -> str:
return self.size
def heapify(self, idx: int) -> None:
# TODO: implement
pass
def build_heap(self) -> None:
# TODO: implement
pass
Demo of building a heap¶
h = Heap([-1,0,0,15,23,1,2,3]) # The heap tree will be built during initialization
print(h)
This should output a valid permutation of following graph:
\-- -1
|-- 0
| |-- 3
| | |-- 15
| \-- 23
\-- 0
|-- 1
\-- 2
Part B: Modifying a heap¶
Now that you can build a min-heap, we want to add some more useful functionality to it such as inserting and removing elements.
Your task: implement the methods heappush
and heappop
for your Heap
class.
heappush
: inserts a new element into the heap (while maintining its heap-property!)heappop
:- this function should remove the heap's minimum element and return it to the caller
- Raise an
IndexError
when trying to pop from an empty heap
Add the following methods to your Heap
class from Part A and finish their implementation:
def heappush(self, key: int) -> None:
# TODO: implement
pass
def heappop(self) -> int:
# TODO: implement
pass
Part C: A Wild Max-Heap Appears¶
As hinted in Part A of this question most of the logic for building and modifying a binary heap is the same between min-heaps and max-heaps. The main difference between the two is the heap-property that has to hold for all nodes.
Your task: Rewrite your Heap
class from Part A and Part B and create two classes, MinHeap
and MaxHeap
, according to the code skeleton below.
Notes¶
MinHeap
andMaxHeap
should be subclasses that implement thecompare()
method.- The
compare()
method should be used in yourheapify
andheappush
methods to maintain the heap-property for the particular type of heap. The function should compare two node values and returnTrue
if the comparison VIOLATES the heap-property for your particular heap.
from math import floor
from typing import List
class Heap:
def __init__(self, array: List[int]) -> None:
self.elements = array
self.size = len(array) # Number of elements in heap
self.build_heap()
# index of left child of the node at idx
def left(self, idx: int) -> int:
return 2 * idx + 1
# index of right child of the node at idx
def right(self, idx: int) -> int:
return 2 * idx + 2
def parent(self, idx: int) -> int:
return floor((idx - 1) / 2)
def swap(self, idx1: int, idx2: int) -> None:
tmp = self.elements[idx1]
self.elements[idx1] = self.elements[idx2]
self.elements[idx2] = tmp
def to_string(self, prefix: str = "", idx: int = 0, left: bool = False) -> str:
if self.size == 0:
return '\\--'
elif idx < self.size:
buf = prefix
if left:
buf = buf + "|-- "
else:
buf = buf + "\\-- "
buf = buf + '\033[1m' + str(self.elements[idx]) + '\033[0m' + '\n'
#buf = buf + str(self.elements[idx]) + '\n' #use if above doesn't work
new_prefix = prefix
if left:
new_prefix = new_prefix + "| "
else:
new_prefix = new_prefix + " "
return buf + \
self.to_string(new_prefix, self.left(idx), True) + \
self.to_string(new_prefix, self.right(idx), False)
else:
return ''
def __str__(self) -> str:
return self.to_string()
def __len__(self) -> int:
return self.size
# TODO: override this in your dervied classes
def compare(self, a: int, b: int) -> bool:
raise NotImplementedError
def heapify(self, idx: int) -> None:
# TODO: your solution from the previous question can go here
# but remember to use your new "self.compare(a, b)" instead
# of raw comparisons
pass
def build_heap(self) -> None:
# TODO: your solution from the previous question can go here
pass
def heappush(self, key: int) -> None:
# TODO: your solution from the previous question can go here
# but remember to use your new "self.compare(a, b)" instead
# of raw comparisons
pass
def heappop(self) -> int:
# TODO: your solution from the previous question can go here
pass
class MinHeap(Heap):
# TODO: complete implementation
pass
class MaxHeap(Heap):
# TODO: complete implementation
pass
Demo: max- and min-heap classes¶
mn = MinHeap([1,2,3,4,5])
mx = MaxHeap([1,2,3,4,5])
print(mn)
print(mx)
\-- 1
|-- 2
| |-- 4
| \-- 5
\-- 3
\-- 5
|-- 4
| |-- 1
| \-- 2
\-- 3
Problem 3 [30 pts]: Priority Queue¶
A priority queue is an abstract data type that represents a set of elements where each element has some key associated with it (a priority). Dequeueing (i.e., removing) an element from the priority queue will return the element with highest (in a max-priority queue) or lowest (in a min-priority queue) key. Thus removing all elements of the priority queue will leave you with a sorted list. A priority queue gets its name from the fact that elements are inserted or removed with priority over others.
Our priority queue should support following three operations:
- put: add an element to the set
- get: retrieve and remove the element with the largest key (or smallest depending on the type of priority queue)
- peek: this should return the largest element (or smallest depending on the type of priority queue)
In our priority queue implementation, the key of an element will be the element itself. In a min-priority queue, an element, X, has a higher priority compared to element, Y, if X < Y
. Examples:
1 < 3
isTrue
, so we assume1
has higher priority'beef pho' < 'chicken pho'
isTrue
(Python compares strings with lexicographical order), so we assume'beef pho'
has higher priority(1685, 'pho') < (1862, 'rice')
isTrue
(Python compares tuples position by position), so we assume(1685, 'pho')
has higher priority.
Note: elements in your queue should all be of the same data type. That is, comparisons between different data types do not need to be handled in your queue implementation.
Note: in a max-priority queue you would compare elements using >
to compute higher priority.
In this problem you will be asked to implement three different versions of a min-priority queue.
Copy below code template into a file called P3.py
.
from random import sample
from time import time
class PriorityQueue:
def __init__(self, max_size):
self.elements = []
self.max_size = max_size
def __len__(self):
return len(self.elements)
def __bool__(self):
return len(self.elements) > 0
def put(self, val):
raise NotImplementedError # TODO
def get(self):
raise NotImplementedError # TODO
def peek(self):
raise NotImplementedError # TODO
def mergesortedlists(lists, pqclass=PriorityQueue):
merged = []
pq = pqclass(len(lists))
for i, l in enumerate(lists):
pq.put((l.pop(0), i))
while pq:
ele, i = pq.get()
merged.append(ele)
if lists[i]:
pq.put((lists[i].pop(0), i))
return merged
def generatelists(n, length=20, dictionary_path='../data/words.txt'):
with open(dictionary_path, 'r') as f:
words = [w.strip() for w in f.readlines()]
lists = []
for _ in range(n):
lists.append(sorted(sample(words, length)))
return lists
def timeit(ns=(10, 20, 50, 100, 200, 500), pqclass=PriorityQueue, n_average=5):
elapsed = []
for n in ns:
timeaccum = 0
for _ in range(n_average):
lists = generatelists(n)
start = time()
merged = mergesortedlists(lists, pqclass)
end = time()
timeaccum += end-start
elapsed.append(timeaccum / n_average)
return elapsed
Part A [10 points]:¶
A naive way to implement a priority queue is to use a plain old list. Inserting an element appends to the list and extracting an element scans the whole list before returning (and removing) the highest priority element (remember what highest priority means in our min-priority queue).
Your task: implement a min-priority queue class called NaivePriorityQueue
Put all your code in a file called P3.py
Your class NaivePriorityQueue
should inherit the base class PriorityQueue
.
PriorityQueue
will be initialized with a max_size
which indicates the capacity (maximum number of elements allowed) of the priority queue.
Your NaivePriorityQueue
class needs to implement the methods:
put
should take in a valueval
and insert it at the end of theelements
list. This method should raise an error if called aftermax_size
is reached. Nothing should be returned.get
should remove the smallest value fromelements
and return it.peek
should return the smallest value in the queue.
Raise an IndexError
in the following three cases:
put
called on a full priority queueget
called on an empty priority queuepeek
called on an empty priority queue
Demo¶
q = NaivePriorityQueue(2)
q.put(1)
q.put(2)
print(q.peek())
1
print(q.get())
1
print(q.get())
2
What should this final statement return?
print(q.get())
Part B [10 points]: Heap-backed priority queue¶
A better way to implement a priority queue is to use a binary heap.
Your task: Implement a min-priority queue using a min-heap. You will implement two versions: (1) using your own heap from question 2 (2) using the heap from Python's standard library
With the heappush
and heappop
methods from the Heap
class, we can effectively treat a Python list as a heap. If we are using a heap to implement a priority queue:
put
should push the element to the heap (heappush
)get
should remove and return the root element of the heap (heappop
)peek
should return the smallest element of your min-heap (Your implementation should be flexible enough to be able to return largest element in a if we replace the min-heap with a max-heap)
Please implement two priority queues called HeapPriorityQueue
and PythonHeapPriorityQueue
that use your own heap implementation and Python's built-in heap respectively. As in Part A please use PriorityQueue
as your base class in the same file P3.py.
Raise IndexError
s as you did in the previous problem where necessary.
Part C [10 points]:¶
Do this problem in a separate Python file called: P3-C.py.
In this problem, we will motivate the use of a priority queue and make you investigate the impact of the underlying implementation choice.
Suppose we want to merge multiple sorted lists into one large sorted list. One way to do it is to maintain a priority queue which holds the head element in each list. We then constantly pop from this queue (to append to the target list) and insert the next element of the corresponding list into the queue. The target list will be sorted because we are always first appending the smallest head element of all lists (given that the lists are sorted, this is the smallest element not merged).
We will use the problem of merging lists using priority queues as our benchmark to compare your three priority queue implementations from the previous problems.
Your task: compare the performance of NaivePriorityQueue
, HeapPriorityQueue
and PythonHeapPriorityQueue
by timing the mergedsortedlists
method using the timeit
method (both of which were provided in the PriorityQueue
skeleton)
The mergesortedlists
function takes a priority queue class as an argument (pqclass
), and performs the merging using that priority queue implementation. Here we test the performance of mergesortedlists
with lists that contains sorted words from the words.txt
file provided. In your final solution, words.txt
can be at the same directory level as your solution file.
The timeit
function takes a priority queue class as an argument (pqclass
) and measures the running time of mergesortedlists
for that priority queue. Another argument for timeit
is ns
, which is an ascending sequence of the numbers of lists to merge (this will be the x-axis for your performance plot described below).
Make a line plot to compare the running time of mergesortedlists
with NaivePriorityQueue
, HeapPriorityQueue
and PythonHeapPriorityQueue
for an increasing number of lists. In the plot, you should have three curves: one for each queue implementation. You should plot the elapsed time against number of lists (the sequence argument, ns
, passed to timeit
).
Graph format:
- Don't use logarithmic scales
- Label your x- and y-axes
- Create a descriptive title
- x-axis: Number of Lists Merged (ranges from 0 to 500; see
timeit
'sns
parameter) - y-axis: Elapsed time in seconds
Save the plot as P3-C.png
.
Note: generatelists
reads words from words.txt
. You should have that in your HW6-final folder.