IntroSort: Quicksort with a parachute

In a previous article on quicksort, I called it a fast sorting algorithm with an achilles heal: for certain inputs quicksort slows waaaaaay down. This is quite unfortunate, because on most inputs quicksort is very fast.

A lot of research has gone into improving quicksorts worst case performance, the majority of it focusing on the choice of pivot. Then in the mid 1990’s David Musser made a crucial breakthrough, and the result of this can be seen in the changes to the performance bounds of std::sort between C++98 and C++11: The earlier specification suggested a sorting implementation that performed in O(n log n) in the average case, clearly hinting towards quicksort. The later specification however, states that std::sort should provide an O(n log n) worst case guarantee.

So what happened to provoke this bump in performance? The answer is of course, the development of IntroSort. IntroSort starts with your standard optimized quicksort (median of 3 pivot selection, threshold cutoff insertion sorting) and adds one more crucial piece: the parachute. Introsort monitors its depth of recursion, and when it detects that it is heading towards its worst case, it pulls the rip cord and falls into heapsort: an O(n log n) worst case sorting algorithm. With this introspection, from which the name intro sort is derived, we can merge the best parts of both algorithms: quicksorts average case speed advantage, with heapsorts worst case gurantee of O(n log n)!

Not Just Any Heapsort…

The most prolific heapsort implementation encountered has two things that make it not particularly well suited for use in introsort. Introsort fall’s back to heapsort when it detects that the depth of recursion is getting to deep, It would thus make sense that we prefer an iterative heapify algorithm, as opposed to the typical recursive implementation. And crucially, we need to use a heapsort that is capable of sorting a slice of an array, without modifying any elements outside of the slice’s bounds.

Iterative heapify algorithms are well studied, and the following will suit our purpose fine:

template <class T>
void _downheap(T a[], int N, int k) {
    while (2*k <= N) {
        int j = 2*k;
        if (j < N && a[j] < a[j+1]) j++;
        if (!(a[k] < a[j])) break;
        exch(a, k, j); k = j;
    }
}

Now as far as point number 2 goes we have 2 options: we can modify _downheap by introducing to more parameters, left and right, and add tests that we are inside the bounds. This has the negative effect of both increasing complexity, and increasing the number of comparisons for each iteration of the loop. Thankfully, we can use a bit of pointer manipulation, so that our heapsort algorithm thinks its sorting an entire array, while actually only sorting a slice of it:

template <class T>
void heapsort(T a[], int l, int r) { //l and r are indices to sort between
    int N = r-l+1; //to provide to _downheap()
    T *pq = a+l-1; //creating a pointer pq, making pq[0] = a[l - 1];
    for (int k = N/2; k >= 1; k--) //now we perform standard heapsort
        _downheap(pq, N, k); //using the indirection provided in
    while (N > 1) { //pq + N as if it were an entire array
        exch(pq, 1, N);
        _downheap(pq,--N,1);
    }
}

You might think that you can skip the level of indirection and just replace N with r and 1 with l, but it doesn’t work that way, and many MANY intro sort tutorials out there get this wrong. The bug is easy to miss because the original implementation of Introsort did one final insertion sort pass to clean up small unsorted subarrays, instead of many calls to insertion sort on each subarray. This effectively hides the flaw in their implementation.

Insertion Sort the smaller slices

Every moden implementation of quicksort, mergesort, and introsort all use the same optimization of using insertion sort on arrays below a certain threshold, again implemented in such a way as to be able to sort a slice of the array:

template <class T>
void inssort(T a[], int l, int r) {
    for (int i = l; i <= r; i++) {
        int j = i; T v = a[j];
        while (j > l && a[j - 1] > v) {
            a[j] = a[j - 1];
            j--;
        }
        a[j] = v;
    }
}

Some implementations, such as the one used in golang’s standard library use shell sort instead of insertion sort, this can be beneficial if the threshold is of such a size that shell sorts improvement of insertion sort are worth the additional overhead.

With these powers combined….

The easiest way to string everything together is to start with a standard recursive quicksort, and build it up from there. Quicksort is of course made of three main parts: the sort loop, the partition scheme, and the pivot selection.

I personally prefer Sedgewick’s crossing pointers partitioning algorithm:

template <class T>
inline void exch(T a[], int l, int r) {
    T tmp = a[l];
a[l] = a[r];
a[r] = tmp;
}

template <class T>
int partition(T a[], int l, int r) {
    medOf3(a, l, r);
    int i = l - 1, j = r; T v = a[r];
    for (;;) {
        while (a[++i] < v);
        while (a[--j] > v) if (j == l) break;
        if (i >= j) break;
        exch(a, i, j);
    }
    exch(a, i, r);
    return i;
}

The median of 3 pivot selection method is the canonical pivot choosing algorithm in Musser’s reference implementation of Introsort. The following median of 3 method puts the three values at a[l], a[m], and a[r] into sorted order, and then places a[m] into a[r], this places the median of the three values at the pivot position for the partition phase.

 void medOf3(T a[], int l, int r) {
    int m = (l+r)/2;
    if (a[m] < a[l])
        exch(a, m, l);
    if (a[r] < a[m]) {
        exch(a, m, r);
        if (a[m] < a[l])
            exch(a, l, m);
    }
    exch(a, m, r);
}

Many implementations of Introsort and quicksort use more complicated pivot selection schemes such as median of medians or 9’thers. One could also use a simple random pivot selection, or choose the middle element as the pivot, both choices being quick to compute and simple to implement and tend to work pretty decently as well.

The final piece of the puzzle is the main sorting loop, which determines when to use insertion sort, when to fall back on heapsort, or whether to proceed using quicksort:

template <class T>
void _introsort(T a[], int l, int r, int d) { //d tracks recursion depth
    if (d == 0) { //when it hits max depth
        heapsort(a, l, r); //fall back on heapsort
        return;
    }
    if (r - l < THRESHOLD) { //use insertion sort on
        inssort(a, l, r); //small subarrays
        return;
    }
    d -= 1; //otherwise quicksort.
    int i = partition(a, l, r);
    _introsort(a, l, i - 1, d);
    _introsort(a, i + 1, r, d);
}

template <class T>
void introsort(T a[], int l, int r) {
    int d = floor(2 * log(r-l));
    _introsort(a, l, r, d);
}

And that’s it, the algorithm that allowed the C++ standards committee to tighten the worst case bound of std::sort while still using a quicksort variant!