BST Traversal¶
- We've stored our data in a BST
- This seemed like a good idea at the time because BSTs have some nice properties
- To be able to access/use our data, we need to be able to traverse the tree
Traversal Choices¶
There are three traversal choices based on an implicit ordering of the tree from left to right:
- In-order: Traverse left-subtree, then current root, then right sub tree
- Post-order: Traverse left subtree, then traverse right subtree, and then current root
- Pre-order: Current root, then traverse left subtree, then traverse right subtree
- Traversing a tree means performing some operation
- In our examples, the operation will be "displaying the data"
- However, an operation could be "deleting files"
Breakout Room (10 mins)¶
- Traverse the BST provided using in-order, post-order, and pre-order traversals. Write the resulting sorted data structure (as a list is fine).
Heaps¶
We listed several types of data structures at the beginning of our data structures unit.
So far, we have discussed lists and trees (in particular binary trees and binary search trees).
Heaps are a type of tree, a little different from binary trees.
Some Motivation: priority queues¶
- People may come to your customer service counter in a certain order, but you might want to serve your VIPs first!
- In other words, there is an "ordering" on your customers and you want to serve people in the order of the most VIP.
- This problem requires us to then sort things by importance and then evaluate things in this sorted order.
- A priority queue is a data structure for this, which allows us to do things more efficiently than simple sorting every time a new thing comes in.
Items are inserted at one end and deleted from the other end of a queue.
The basic priority queue is defined to be supporting three primary operations:
- Insert: insert an item with "key" (e.g. an importance) $k$ into priority queue $Q$.
- Find Minimum: get the item whose key value is smaller than any other key in $Q$.
- Note: Depending on implementation, this may also be Find Maximum.
- Delete Minimum: Remove the item with minimum $k$ from $Q$.
Comments on Implementation of Priorty Queues¶
One could use an unsorted array and store a pointer to the minimum index; accessing the minimum is an $\mathcal{O}(1)$ operation.
- It's cheap to update the pointer when new items are inserted into the array because we update it in $\mathcal{O}(1)$ only when the new value is less than the current one.
- Finding a new minimum after deleting the old one requires a scan of the array ($\mathcal{O}(n)$ operation) and then resetting the pointer.
One could alternatively implement the priority queue with a balanced binary tree structure. Then we'll get performance of $\displaystyle\mathcal{O}\left(\log(n)\right)$!
This leads us to heaps. Heaps are a type of balanced binary tree.
- A heap providing access to minimum values is called a min-heap
- A heap providing access to maximum values is called a max-heap
- Note that you can't have a min-heap and max-heap together
Priority queues are an absract data type. They are often implemented using heaps.
Quick notes on balanced binary trees¶
- The height $h$ of a tree is the number of edges in its longest branch
- In a balanced binary tree, the height difference between left and right subtrees is no greater than $1$.
Heapsort¶
- Implementing a priority queue with
selection sort
takes $\displaystyle\mathcal{O}\left(n^{2}\right)$ operations - Using a heap takes $\mathcal{O}(n\log(n))$ operations
Implementing a sorting algorithm using a heap is called heapsort
.
Heapsort
is an in-place sort and requires no extra memory.
Note that there are many sorting algorithms nowadays. Python
uses Timsort
.
Let's get back to heaps.
A heap has two properties.
- Shape property
- A leaf node at depth $j>0$ can exist only if all the nodes at the previous depth exist. Nodes at any partially filled level are added "from left to right".
- Heap property
- For a min-heap, each node in the tree contains a key less than or equal to either of its two children (if they exist).
- This is also known as the labeling of a "parent node" dominating that of its children.
- For a min-heap, each node in the tree contains a key less than or equal to either of its two children (if they exist).
- Heap property
- For a max-heap, a parent node must be greater-than-or-equal to its children.
Heap Mechanics¶
- Heaps are a special binary tree that can be stored in arrays
- This is more memory-efficient than the
Node
class and pointer logic used in BSTs
- This is more memory-efficient than the
- The first element in the array is the root key.
- The next two elements make up the first level of children. This is done from left to right.
- Then the next four and so on.
(Picture taken from https://towardsdatascience.com/data-structure-heap-23d4c78a6962)
Note: If a parent node is at index $i$, then its children will be at indices $2i$ or $(2i+1)$.
Question: What if in the indexing starts from $0$?
Answer: $2i+1$ and $2\left(i+1\right)$.
Construct a Heap¶
To construct a heap, insert each new element that comes in at the left-most open spot.
This maintains the shape property but not the heap property.
Restore the Heap Property by "Bubbling Up"¶
Look at the parent and if the child "dominates" we swap parent and child. Repeat this process until we bubble up to the root.
Identifying the dominant is now easy because it will be at the top of the tree.
This process is called heapify
and must also be done at the first construction of the heap.
Deletion¶
Removing the dominant key creates a hole at the top (the first position in the array).
Fill this hole with the rightmost position in the array, or the rightmost leaf node.
This destroys the heap property!
So we now bubble this key down until it dominates all its children.
Practice¶
- Construct a min-heap for the array $$\left[1, 8, 5, 9, 23, 2, 45, 6, 7, 99, -5\right].$$
- Delete $-5$ and update the min-heap.