Binary Insertion Sort & Minimum Comparison Sorting

In the early days of computing when computer science practitioners worked in incredibly resource constrained environments, and the vast tomes of research into efficient sorting had yet to be composed and algorithmic complexity was still in its infancy much research was done in to deriving sorting algorithms that performed their task with the minimum number of comparisons possible.

For some data types, such as strings, where multiple individual comparisons are conducted to perform the overall comparison finding an algorithm that performs the minimum number of them can yield a decent increase in performance. It is from this branch of research that binary insertion sort arose.

In this post I will be going over the binary insertion sort algorithm, and discuss why its not quite the magical sorting bullet that programmers hoped it to be at the time.

Insertion Sort

Insertion sort has long been recognized as the “fastest of the slow” sorting algorithms, that is, while it is in the same class of algorithms as bubble sort and selection sort it outperforms both by a decent margin. Insertion sort works by splitting the array into sorted and un sorted segments, and values from the unsorted portion are inserted into the sorted portion in the proper order 1 by 1.

template <class T>
void insertionsort(T a[], int l, int r) {
    for (int i = l; i < r; i++) {
        int j = i; T v = a[i];
        while (j > l && a[j - 1] > v) {
            a[j] = a[j - 1];
            j--;
        }
        a[j] = v;
    }
}

The inner loop or insertion sort is comprised of a linear search for the proper insertion point, shifting array items over as needed. Binary Insertion sort makes use of – surprise! – binary search to find the insertion position.

Finding Insertion Positions With Binary Search

The sorting problem and the searching problem are in some aspects, opposite sides of the same coin. If searching can be sped up by sorting then it follows that we should be able to speed up sorting can by efficient searching where required.

template <class T>
int binary_search(T a[], int l, int r, T k) {
    while (l <= r) {
        int m = l+(r-l)/2;
        if (k < a[m]) { r = m - 1; }
        else if (k > a[m]) { l = m + 1; }
else { return m; }
    }
    return l;
}

Under normal circumstances, binary search would return some kind of “not found” flag on a search miss. By changing binary search so that it returns the last-used ‘left’ value before terminating it returns the array position where the value being searched for should be inserted!

Binary search has an every case complexity of O(log N) compared to linear searches O(N). For any non-trivial value of N, the savings are substantial: It should take binary search 16 comparisons in the worst case to find a value in a one million element list, where linear search needs to check half a million records on average and the entire list in the worst case.

With Our Powers Combined….

Binary Insertion sort is exactly what the name implies. It is standard insertion sort, but with the inner loop is comparison optimized to perform the minimum number of comparisons by utilizing binary search. As we know, binary search only works on an already sorted sequence. Thankfully for us, Insertion sort maintains a growing sorted section at the beginning of the array.

On each iteration of the outer loop, the first element in the unsorted portion of the array is used as the key for binary search, which searches the sorted portion and returns the index where it should be inserted. Next, all of the values to the right of the returned position are shifted right by 1 space to make space for the value to be put in its sorted position.

template <class T>
void binary_insertion_sort(T a[], int l, int r) {
    for (int i = l; i <= r; i++) {
        int j = i;
        T v = a[j];
        int spos = binary_search(a, l, i, v); //Note the upper bound
        while (j > spos) {
            a[j] = a[j - 1];
            j--;
        }
        a[spos] = v;
    }
}

If insertion into a sorted array wasn’t an O(n) operation, this algorithm would be an O(n log n) algorithm. Indeed, when it comes to the worst case, Binary Insertion Sort performs O(n log n) comparisons. Alas, the complexity is unfortunately still dominated by the need to shift large portions of the array on every insertion leaving us with an O(n^2) worst case overall complexity. While this algorithm does not yield the kind of performance differences you see between an O(n^2) algorithm and O(n log n) algorithm, however it is still significantly faster than regular insertion sort in practice, especially when comparisons are an expensive operation.

Until next time, Happy Hacking!