The Heapsort algorithm

Ahh, Heaps. 

Heaps are one of those data structures that turn up everywhere. They serve a humble purpose: find the maximum value (depending on if you’re using min or max ordering) of a set efficiently. It is this very purpose that imbues them with the ability to implement a very simple, efficient sorting algorithm.

To be clear, in this article when I say “Heap” i’m talking about array-based max-ordered binary heaps.

The naiive approach…

Do get the basic idea of how heapsorting works we could implement heapsort with a priority queue, with the drawback of that approach requiring O(n) extra space, regardless it is worth while as a way to gain further understanding of what it is we are striving to accomplish. To sort with a priority queue, you simply add the entire set to the PQ, and then pop the maximum value off and store it in the array in decreasing order from the last position until the priority queue is empty, yielding our original set but in sorted order. Lets take a look:

It works, but we can do better. In fact, we can perform heapsort in place. Meaning we can whittle that pesky O(n) space requirement all the way down to O(1)!

The “Heapify” Algorithm

The act of taking an array and re-ordering it so that it is heap-ordered is appropriately named “heapifying” and various algorithm’s have been created to maintain heap order in an array. Bottom up, Top down, Iterative, Recursive – theres alot of ways to skin a cat. In his 1986 book “Programming Pearls” Jon Bentley gives a fabulous discussion of heaps and there construction. 

One of the more common heapify algorithms you see is straight out of the previously mentioned book, and is a straight forward recursive implementation:

Heapsort

Armed with our heapify algorithm it is no a simple affair to write the sorting algorithm its self. The first step is to scan through the first half of the array heapifying it as we go. With that done, we repeatedly pop off the first element moving it to the back of the array, and re-heapifying the array each time.

Because swap() is an O(1) operation, and heapify is O(log n), and we call swap & heapify for each element in the array(O(n)), heapsort is clearly an O(n log n) sorting algorithm.

Wrap up

So there you have it: heapsort. Heap sort has alot going for it: it’s an in-place O(n log n) worst case sorting algorithm. Unfortunately, it also happens to be unstable – which in an of its self is not the biggest problem – quicksort is also unstable. However, Big O notation is masking something: those invisible constants that are ignored in big O are actually quite, well, big. This means despite possessing O(n log n) worst case behaviour, it is usually about 2x SLOWER than quicksort. So while heapsort can provide efficient sorting, there is USUALLY a better option, but it is still an algorithm worth knowing. As a side note, smoothsort is a variant of heapsort invented by E. Dijkstra is able to provide near-linear sorting times for data that is almost sorted, but it is based on a non-trivial data structure called Leonardo heaps and its implementation is not the easiest to grok

Leave a Reply

Your email address will not be published. Required fields are marked *