In my data structures class I have been learning about Binary Heaps. I have been coding examples of min and max Binary Heaps in both Python and C# the past few days, and have become quite fond of the simplicity and elegance of the data structure. Typically you just need the ability to push an element onto the Binary Heap and pop either the minimum or maximum element from the Binary Heap. However, if you write other methods, like
change priority or
remove if using it as a Priority Queue, the solutions are very elegant, because they re-use the
siftdown methods already in place for push and pop. It's a pretty amazing data structure when you get to know its inner-workings. The fact that you can use it to sort items (Heapsort) makes it all that much cooler.
Before I dive into a simple implementation of a min Binary Heap in Python, I first want to mention
heapq, which is a collection of functions offering min Binary Heap functionality in the Python collections module.
heapq in Python
I was first introduced to
heapq in Python when I was looking for alternative solutions to finding the two largest integers in a list in Python. This was an actual problem I needed to solve in my computer science class. I had solved it using a number of methods and I was looking for other solutions. This is when I found
heapq. It has a wonderful method, called
nlargest, that returns the first n largest numbers in a list.
integers = [1, 16, 3, 39, 26, 4, 8, 16] # get 2 largest values largest_integers = heapq.nlargest(2, integers) # [39, 26]
At the time I knew nothing about Binary Heaps and
heapq, but now that I understand how Binary Heaps work,
heapq is a wonderful set of functions that I can now fully appreciate.
heapq and Min Binary Heap
What's interesting about
heapq is that it is not a container. You define a list that will be your heap and pass it to
heapq to perform heap operations. You can start with an empty list and push a bunch of integers onto the heap and then pop them off one at a time, essentially performing a Heapsort. Because
heapq is a min Binary Heap, it pops from lowest to highest.
import heapq heap =  heapq.heappush(heap, 12) heapq.heappush(heap, 19) heapq.heappush(heap, 88) heapq.heappush(heap, 62) heapq.heappush(heap, 2) heapq.heappush(heap, 14) heapq.heappush(heap, 92) while len(heap) > 0: print(heapq.heappop(heap), end=' ') # 2 12 14 19 62 88 92
You can also start with a list of values and then perform push and pop operations. In this case, you will first need to
heapify the list before pushing and popping values.
import heapq heap = [12, 19, 88, 62, 2, 14, 92] heapq.heapify(heap) heapq.heappush(heap, 3) while len(heap) > 0: print(heapq.heappop(heap), end=' ') # 2 3 12 14 19 62 88 92
After writing several samples of min and max Binary Heaps, I like that
heapq is a collection of functions that provide Binary Heap functionality as opposed to also being a container. Allows the developer to have more control of the container at the risk of perhaps writing code that alters the container unexpectedly. There is more to
heapq, but this gives you a taste of the two main ways to use
heapq, with or without starting values, and the fundamental push and pop operations offered by a Binary Heap.
Custom Min Binary Heap in Python
There are 2 important features of a Binary Heap. One, it is a complete Binary Tree, which means it can be stored as an Array. Two, the nodes are ordered in a particular way. For a max Binary Heap, the value of a node is at least the value of its children. For a min Binary Heap, the value of a node is no more than the value of its children.
Here is a simple, bare-bones example of a min Binary Heap in Python providing simple push and pop operations.
class Heap: def __init__(self): self.__heap =  self.__last_index = -1 def push(self, value): self.__last_index += 1 if self.__last_index < len(self.__heap): self.__heap[self.__last_index] = value else: self.__heap.append(value) self.__siftup(self.__last_index) def pop(self): if self.__last_index == -1: raise IndexError('pop from empty heap') min_value = self.__heap self.__heap = self.__heap[self.__last_index] self.__last_index -= 1 self.__siftdown(0) return min_value def __siftup(self, index): while index > 0: parent_index, parent_value = self.__get_parent(index) if parent_value <= self.__heap[index]: break self.__heap[parent_index], self.__heap[index] =\ self.__heap[index], self.__heap[parent_index] index = parent_index def __siftdown(self, index): while True: index_value = self.__heap[index] left_child_index, left_child_value = self.__get_left_child(index, index_value) right_child_index, right_child_value = self.__get_right_child(index, index_value) if index_value <= left_child_value and index_value <= right_child_value: break if left_child_value < right_child_value: new_index = left_child_index else: new_index = right_child_index self.__heap[new_index], self.__heap[index] =\ self.__heap[index], self.__heap[new_index] index = new_index def __get_parent(self, index): if index == 0: return None, None parent_index = (index - 1) // 2 return parent_index, self.__heap[parent_index] def __get_left_child(self, index, default_value): left_child_index = 2 * index + 1 if left_child_index > self.__last_index: return None, default_value return left_child_index, self.__heap[left_child_index] def __get_right_child(self, index, default_value): right_child_index = 2 * index + 2 if right_child_index > self.__last_index: return None, default_value return right_child_index, self.__heap[right_child_index] def __len__(self): return self.__last_index + 1
In class I was required to write Binary Heaps using a fixed-size array with a maximum size, but in this case I let the Python List grow as big as necessary. I also don't delete cells in the list. I just use a pointer, called
last_index, to point to the last value in the heap.
I also re-wrote the
siftdown methods to be iterative functions as opposed to recursive functions. I like the elegance of recursive functions and it was fine for class assignments, but most developers comment "negatively" on them.
The importat public methods are
pop for adding and removing items from the heap. The important private methods are
siftdown, which are responsible for maintaining the ordered hierarchy of the heap.
Let's execute a Heapsort on a list of integers by pushing them all onto the heap and popping them off one-by-one.
values = random.sample(range(10000), 10) print(values) h = Heap() for v in values: h.push(v) while len(h) > 0: print(h.pop(), end=' ') [3383, 1851, 4975, 435, 5008, 480, 5151, 6462, 2936, 1621] 435 480 1621 1851 2936 3383 4975 5008 5151 6462
Looks good. The original list of random integers is in no particular order. However, as we pop each value from the min Binary Heap they arrive in numerically increasing order (Heapsort)! If this were a max Binary Heap they would arrive in decreasing order.
Such an elegant data structure.
Posted by Koder Dojo
I am a freelance C#, ASP.NET MVC Developer learning Python. I am taking several online computer science, algorithms, cryptography, and now data science courses using Python. It is a wonderful learning experience and I am journaling my adventures on this website. I am eager to meet new people and share my experiences. You can find me on twitter as @KoderDojo.