In my data structures class I have been learning about Binary Heaps. I have been coding examples of min and max Binary Heaps in both Python and C# the past few days, and have become quite fond of the simplicity and elegance of the data structure. Typically you just need the ability to push an element onto the Binary Heap and pop either the minimum or maximum element from the Binary Heap. However, if you write other methods, like change priority or remove if using it as a Priority Queue, the solutions are very elegant, because they re-use the siftup and siftdown methods already in place for push and pop. It's a pretty amazing data structure when you get to know its inner-workings. The fact that you can use it to sort items (Heapsort) makes it all that much cooler.

Before I dive into a simple implementation of a min Binary Heap in Python, I first want to mention heapq, which is a collection of functions offering min Binary Heap functionality in the Python collections module.

heapq in Python

I was first introduced to heapq in Python when I was looking for alternative solutions to finding the two largest integers in a list in Python. This was an actual problem I needed to solve in my computer science class. I had solved it using a number of methods and I was looking for other solutions. This is when I found heapq. It has a wonderful method, called nlargest, that returns the first n largest numbers in a list.

integers = [1, 16, 3, 39, 26, 4, 8, 16]

# get 2 largest values
largest_integers = heapq.nlargest(2, integers)  # [39, 26]

At the time I knew nothing about Binary Heaps and heapq, but now that I understand how Binary Heaps work, heapq is a wonderful set of functions that I can now fully appreciate.

heapq and Min Binary Heap

What's interesting about heapq is that it is not a container. You define a list that will be your heap and pass it to heapq to perform heap operations. You can start with an empty list and push a bunch of integers onto the heap and then pop them off one at a time, essentially performing a Heapsort. Because heapq is a min Binary Heap, it pops from lowest to highest.

import heapq

heap = []

heapq.heappush(heap, 12)
heapq.heappush(heap, 19)
heapq.heappush(heap, 88)
heapq.heappush(heap, 62)
heapq.heappush(heap, 2)
heapq.heappush(heap, 14)
heapq.heappush(heap, 92)

while len(heap) > 0:
    print(heapq.heappop(heap), end=' ')
    
# 2 12 14 19 62 88 92

You can also start with a list of values and then perform push and pop operations. In this case, you will first need to heapify the list before pushing and popping values.

import heapq

heap = [12, 19, 88, 62, 2, 14, 92]

heapq.heapify(heap)

heapq.heappush(heap, 3)

while len(heap) > 0:
    print(heapq.heappop(heap), end=' ')

# 2 3 12 14 19 62 88 92

After writing several samples of min and max Binary Heaps, I like that heapq is a collection of functions that provide Binary Heap functionality as opposed to also being a container. Allows the developer to have more control of the container at the risk of perhaps writing code that alters the container unexpectedly. There is more to heapq, but this gives you a taste of the two main ways to use heapq, with or without starting values, and the fundamental push and pop operations offered by a Binary Heap.

Custom Min Binary Heap in Python

There are 2 important features of a Binary Heap. One, it is a complete Binary Tree, which means it can be stored as an Array. Two, the nodes are ordered in a particular way. For a max Binary Heap, the value of a node is at least the value of its children. For a min Binary Heap, the value of a node is no more than the value of its children.

Here is a simple, bare-bones example of a min Binary Heap in Python providing simple push and pop operations.

class Heap:
    def __init__(self):
        self.__heap = []
        self.__last_index = -1

    def push(self, value):
        self.__last_index += 1
        if self.__last_index < len(self.__heap):
            self.__heap[self.__last_index] = value
        else:
            self.__heap.append(value)
        self.__siftup(self.__last_index)

    def pop(self):
        if self.__last_index == -1:
            raise IndexError('pop from empty heap')

        min_value = self.__heap[0]

        self.__heap[0] = self.__heap[self.__last_index]
        self.__last_index -= 1
        self.__siftdown(0)

        return min_value

    def __siftup(self, index):
        while index > 0:
            parent_index, parent_value = self.__get_parent(index)

            if parent_value <= self.__heap[index]:
                break

            self.__heap[parent_index], self.__heap[index] =\
                self.__heap[index], self.__heap[parent_index]

            index = parent_index

    def __siftdown(self, index):
        while True:
            index_value = self.__heap[index]

            left_child_index, left_child_value = self.__get_left_child(index, index_value)
            right_child_index, right_child_value = self.__get_right_child(index, index_value)

            if index_value <= left_child_value and index_value <= right_child_value:
                break

            if left_child_value < right_child_value:
                new_index = left_child_index
            else:
                new_index = right_child_index

            self.__heap[new_index], self.__heap[index] =\
                self.__heap[index], self.__heap[new_index]

            index = new_index

    def __get_parent(self, index):
        if index == 0:
            return None, None

        parent_index = (index - 1) // 2

        return parent_index, self.__heap[parent_index]

    def __get_left_child(self, index, default_value):
        left_child_index = 2 * index + 1

        if left_child_index > self.__last_index:
            return None, default_value

        return left_child_index, self.__heap[left_child_index]

    def __get_right_child(self, index, default_value):
        right_child_index = 2 * index + 2

        if right_child_index > self.__last_index:
            return None, default_value

        return right_child_index, self.__heap[right_child_index]

    def __len__(self):
        return self.__last_index + 1

In class I was required to write Binary Heaps using a fixed-size array with a maximum size, but in this case I let the Python List grow as big as necessary. I also don't delete cells in the list. I just use a pointer, called last_index, to point to the last value in the heap.

I also re-wrote the siftup and siftdown methods to be iterative functions as opposed to recursive functions. I like the elegance of recursive functions and it was fine for class assignments, but most developers comment "negatively" on them.

The importat public methods are push and pop for adding and removing items from the heap. The important private methods are siftup and siftdown, which are responsible for maintaining the ordered hierarchy of the heap.

Let's execute a Heapsort on a list of integers by pushing them all onto the heap and popping them off one-by-one.

values = random.sample(range(10000), 10)
print(values)

h = Heap()
for v in values:
    h.push(v)

while len(h) > 0:
    print(h.pop(), end=' ')


[3383, 1851, 4975, 435, 5008, 480, 5151, 6462, 2936, 1621]

435 480 1621 1851 2936 3383 4975 5008 5151 6462

Looks good. The original list of random integers is in no particular order. However, as we pop each value from the min Binary Heap they arrive in numerically increasing order (Heapsort)! If this were a max Binary Heap they would arrive in decreasing order.

Such an elegant data structure.

Best Wishes!

Posted by Koder Dojo

I am a freelance C#, ASP.NET MVC Developer learning Python. I am taking several online computer science, algorithms, cryptography, and now data science courses using Python. It is a wonderful learning experience and I am journaling my adventures on this website. I am eager to meet new people and share my experiences. You can find me on twitter as @KoderDojo.