Key Word(s): Binary search trees, BST Traversal, Heaps, Priority Queues, Generators
BST Traversal¶
- We've stored our data in a BST
- This seemed like a good idea at the time because BSTs have some nice properties
- To be able to access/use our data, we need to be able to traverse the tree
Traversal Choices¶
There are three traversal choices based on an implicit ordering of the tree from left to right:
- In-order: Traverse left-subtree, then current root, then right sub tree
- Post-order: Traverse left subtree, then traverse right subtree, and then current root
- Pre-order: Current root, then traverse left subtree, then traverse right subtree
- Traversing a tree means performing some operation
- In our examples, the operation will be "displaying the data"
- However, an operation could be "deleting files"
Exercise 1¶
Heaps¶
We listed several types of data structures at the beginning of our data structures unit.
So far, we have discussed lists and trees (in particular binary trees and binary search trees).
Heaps are a type of tree, a little different from binary trees.
Some Motivation: priority queues¶
- People may come to your customer service counter in a certain order, but you might want to serve your VIPs first!
- In other words, there is an "ordering" on your customers and you want to serve people in the order of the most VIP.
- This problem requires us to then sort things by importance and then evaluate things in this sorted order.
- A priority queue is a data structure for this, which allows us to do things more efficiently than simple sorting every time a new thing comes in.
Items are inserted at one end and deleted from the other end of a queue.
The basic priority queue is defined to be supporting three primary operations:
- Insert: insert an item with "key" (e.g. an importance) $k$ into priority queue $Q$.
- Find Minimum: get the item whose key value is smaller than any other key in $Q$.
- Note: Depending on implementation, this may also be Find Maximum.
- Delete Minimum: Remove the item with minimum $k$ from $Q$.
Comments on Implementation of Priorty Queues¶
One could use an unsorted array and store a pointer to the minimum index; accessing the minimum is an $\mathcal{O}(1)$ operation.
- It's cheap to update the pointer when new items are inserted into the array because we update it in $\mathcal{O}(1)$ only when the new value is less than the current one.
- Finding a new minimum after deleting the old one requires a scan of the array ($\mathcal{O}(n)$ operation) and then resetting the pointer.
One could alternatively implement the priority queue with a balanced binary tree structure. Then we'll get performance of $\displaystyle\mathcal{O}\left(\log(n)\right)$!
This leads us to heaps. Heaps are a type of balanced binary tree.
- A heap providing access to minimum values is called a min-heap
- A heap providing access to maximum values is called a max-heap
- Note that you can't have a min-heap and max-heap together
Priority queues are often implemented using heaps.
Heapsort¶
- Implementing a priority queue with
selection sort
takes $\displaystyle\mathcal{O}\left(n^{2}\right)$ operations - Using a heap takes $\mathcal{O}(n\log(n))$ operations
Implementing a sorting algorithm using a heap is called heapsort
.
Heapsort
is an in-place sort and requires no extra memory.
Note that there are many sorting algorithms nowadays. Python
uses Timsort
.
Let's get back to heaps.
A heap has two properties.
- Shape property
- A leaf node at depth $k>0$ can exist only if all the nodes at the previous depth exist. Nodes at any partially filled level are added "from left to right".
- Heap property
- For a min-heap, each node in the tree contains a key less than or equal to either of its two children (if they exist).
- This is also known as the labeling of a "parent node" dominating that of its children.
- For a min-heap, each node in the tree contains a key less than or equal to either of its two children (if they exist).
- Heap property
- For a max-heap, a parent node must be greater-than-or-equal to its children.
Heap Mechanics¶
- Heaps are a special binary tree that can be stored in arrays
- This is more memory-efficient than the
Node
class and pointer logic used in BSTs
- This is more memory-efficient than the
- The first element in the array is the root key.
- The next two elements make up the first level of children. This is done from left to right.
- Then the next four and so on.
(Picture taken from https://towardsdatascience.com/data-structure-heap-23d4c78a6962)
Note: If a parent node is at index $i$, then its children will be at indices $2i$ or $(2i+1)$.
Question: What if in the indexing starts from $0$?
Construct a Heap¶
To construct a heap, insert each new element that comes in at the left-most open spot.
This maintains the shape property but not the heap property.
Restore the Heap Property by "Bubbling Up"¶
Look at the parent and if the child "dominates" we swap parent and child. Repeat this process until we bubble up to the root.
Identifying the dominant is now easy because it will be at the top of the tree.
This process is called heapify
and must also be done at the first construction of the heap.
Deletion¶
Removing the dominant key creates a hole at the top (the first position in the array).
Fill this hole with the rightmost position in the array, or the rightmost leaf node.
This destroys the heap property!
So we now bubble this key down until it dominates all its children.
Exercise 2¶
Iterables/Iterators Again¶
We have been discussing data structures and simultaneously exploring iterators and iterables.
class SentenceIterator:
def __init__(self, words):
self.words = words
self.index = 0
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index += 1
return word
def __iter__(self):
return self
class Sentence: # An iterable
def __init__(self, text):
self.text = text
self.words = text.split()
def __iter__(self):
return SentenceIterator(self.words)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
Example Usage¶
a = Sentence("Dogs will save the world and cats will eat it.")
for item in a:
print(item)
print("\n")
it = iter(a) # it is an iterator
while True:
try:
nextval = next(it)
print(nextval)
except StopIteration:
del it
break
Every collection in Python is iterable.¶
We have already seen iterators are used to make for
loops. They are also used to make other collections:
- To loop over a file line by line from disk
- In the making of
list
,dict
, andset
comprehensions - In unpacking tuples
- In parameter unpacking in function calls (
*args
syntax)
An iterator defines both __iter__
and a __next__
(the first one is only required to make sure an iterator is an iterable).
Recap: An iterator retrieves items from a collection. The collection must implement __iter__
.
Generators¶
- A generator function looks like a normal function, but yields values instead of returning them.
- The syntax is (unfortunately) the same otherwise (PEP 255 -- Simple Generators).
- A generator is a different beast from a function. When a function with a
yield
keyword in it runs, it creates a generator. - The generator is an iterator and gets an internal implementation of
__iter__
and__next__
.
def gen123():
print("A")
yield 1
print("B")
yield 2
print("C")
yield 3
g = gen123()
print(gen123, " ", type(gen123), " ", type(g))
print("A generator is an iterator.")
print("It has {} and {}".format(g.__iter__, g.__next__))
Some notes on generators¶
- When
next
is called on the generator, the function proceeds until the first yield.
- The function body is now suspended and the value in the yield is then passed to the calling scope as the outcome of the
next
.
- When
next
is called again, it gets__next__
called again (implicitly) in the generator, and the next value is yielded.
- This continues until we reach the end of the function, the return of which creates a
StopIteration
in next.
Any Python function that has the yield keyword in its body is a generator function.
print(next(g))
print("\n")
print(next(g))
print("\n")
print(next(g))
print("\n")
print(next(g))
More notes on generators¶
- Generators yield one item at a time
- In this way, they feed the
for
loop one item at a time
for i in gen123():
print(i, "\n")