A Bottom Up Natural Mergesort Variant For Linked Lists

Mergesort was the first O(nlogn) sorting algorithm used. Donald Knuth has speculated that it may have even been the first sorting algorithm ever programmed into a computer, with John Von Neuman being aware of it as early as 1945.[1] For all of its history, and despite the desirable properties of O(nlogn) worst case complexity and being a stable sort, it has been hamstrung by the fact that it needs an auxiliary buffer to accomplish this. In other words: Mergesort is not an in-place algorithm.

But that’s only if you’re talking about sorting an array. Mergesort is the gold standard when it comes to sorting linked lists. Stable, with O(1) memory usage, Mergesort has O(nlogn) worst case guarantees in it’s common recursive top down variety.

There are many variants of Mergesort for linked lists based on the implementation choices of the author regarding the use of header nodes, dummy nodes, explicitly carrying the list length, or splitting the list through pointer chasing. As you can see there are a lot of ways to implement a mergesort – some better than others.

The method I will be covering in this post was introduced by Golin and Sedgewick in a 1993 letter to the IPL[2]. Interestingly, it is a bottom up Mergesort with an implementation for linked lists that is so delightfully simple it’s a wonder that it gets so little mention in the available literature. Even better: It’s been proven to use an optimal number of comparison in the worst case, performance which eludes the iterative array based version.

Queue Mergesort

The Queue Mergesort algorithm is made up of two phases. In the first phase it decomposes the input list into single nodes. A queue is used to store the individual nodes. This process is destructive of the input list, making use of a temporary pointer to the current node, while the head pointer moves one space forward. The current nodes next pointer is set to null and the ‘current’ node is enqueued.

 void decomposeList(queue<link>& fq, link& h) {
    while (h != nullptr) {
        link t = h; h = h->next;
        t->next = nullptr;
        fq.push(t);
    }
}

Once the entire list is consumed and the queue is made up of the individual nodes (a.k.a. single lists) we are ready for the merge phase. The merge phase works by popping the first two items from the queue, merging them together, and pushing the resulting merged list back on to the queue.

 inline link dequeue(queue<link>& q) {
    link tmp = q.front();
    q.pop();
    return tmp;
}

void mergesort(link& h) {
    queue<link> fq;
    decomposeList(fq, h);
    while (fq.size() != 1) {
        fq.push(merge(dequeue(fq), dequeue(fq)));
    }
    h = dequeue(fq);
}

The helper function dequeue() is used because the FIFO queue provided by the STL has a pop() which does not return the item being removed. It is not strictly necessary, but makes the code easier to read. The algorithm for the actual merge is a normal iterative algorithm for merging two sorted linked lists[4].

 link merge(link a, link b) {
    node dummy(0, nullptr); link c = &dummy;
    while (a != nullptr && b != nullptr) {
        if (a->info < b->info) {
            c->next = a; a = a->next; c = c->next;
        } else {
            c->next = b; b = b->next; c = c->next;
        }
    }
    c->next = (a == nullptr) ? b:a;
    return dummy.next;
}

Putting it all together.

#include <iostream>
#include <queue>
using namespace std;

struct node {
    int info;
    node* next;
    node(int i = 0, node* n = nullptr) {
        info = i;
        next = n;
    }
};

typedef node* link;

link newlist() {
    link h = nullptr;
    for (int i = 0; i < 12; i++)
        h = new node(rand() % 50, h);
    return h;
}

void print(link h) {
    for (link t = h; t != nullptr; t = t->next)
        cout<<t->info<<" ";
    cout<<endl;
}

link merge(link a, link b) {
    node dummy(0, nullptr); link c = &dummy;
    while (a != nullptr && b != nullptr) {
        if (a->info < b->info) {
            c->next = a; a = a->next; c = c->next;
        } else {
            c->next = b; b = b->next; c = c->next;
        }
    }
    c->next = (a == nullptr) ? b:a;
    return dummy.next;
}

inline link dequeue(queue<link>& q) {
    link tmp = q.front();
    q.pop();
    return tmp;
}

void decomposeList(queue<link>& fq, link& h) {
    while (h != nullptr) {
        link t = h; h = h->next;
        t->next = nullptr;
        fq.push(t);
    }
}

void mergesort(link& h) {
    queue<link> fq;
    decomposeList(fq, h);
    while (fq.size() != 1) {
        fq.push(merge(dequeue(fq), dequeue(fq)));
    }
    h = dequeue(fq);
}

int main() {
    link head = newlist();
    print(head);
    mergesort(head);
    print(head);
    return 0;
}

max@MaxGorenLaptop:/mnt/c/Users/mgoren$ ./bmso
27 12 21 49 42 36 35 43 15 27 36 33
12 15 21 27 27 33 35 36 36 42 43 49
max@MaxGorenLaptop:/mnt/c/Users/mgoren$

A hop skip to a natural merge sort variant

In the previous algorithm we proceeded by splitting the list into singletons. But the only requirement for our merge algorithm is that both list be sorted before being merged. This means we can exploit naturally occuring runs of already ordered sections of the list, better yet: It only requires a very slight modification to the decomposeList() method:

 void decomposeList(Queue& fq, link& h) {
    link q;
    while (h != nullptr) {
        link t = h;
        while (h && h->next) {
            if (h->info <= h->next->info) h = h->next;
            else break;
        }
        if (h->next != nullptr) {
            q = h->next;
            h->next = nullptr;
            h = q;
        } else {
            h = h->next;
        }
        fq.push(t);        
    }
}

By now I’m sure you’re probably anxious to know how this algorithm stacks up to other linked list sorts? Then you’re in for a treat, because I put together a series of tests to compare the sorting performance against the C++ standard libraries implementation of merge sort for singly linked lists.

#include <iostream>
#include <list>
#include <queue>
#include "qbumsort.hpp"
using namespace std;

double stdlistsort(forward_list<int>& fl) {
    auto t1 = chrono::steady_clock::now();
    fl.sort();
    auto t2 = chrono::steady_clock::now();
    auto time = chrono::duration<double, std::milli>(t2 - t1);
    return time.count();
}

double queuebumsort(link head) {
    auto t1 = chrono::steady_clock::now();
    mergesort(head);
    auto t2 = chrono::steady_clock::now();
    auto time = chrono::duration<double, std::milli>(t2 - t1);
    return time.count();
}

int main() {
    cout<<setw(8)<<"size"<<"  "<<"stl sort"<<"   "<<"bumsort"<<endl;
    for (int i = 100; i <= 2000000; i >= 1000000 ? i*=2:i*= 10) {
        link head = newlist(i, 30, RAND_MAX);
        forward_list<int> fl;
        for (link t = head; t != nullptr; t = t->next)
            fl.push_front(t->info);
        auto stltime = stdlistsort(fl);
        auto bmstime = queuebumsort(head);
        cout<<setw(8)<<i<<": "<<stltime<<"  "<<bmstime<<endl;
        link t = head;
        while (head != nullptr) {
            t = head;
            head = head->next;
            delete t;
        }
    }
    return 0;
}

max@MaxGorenLaptop:/mnt/c/Users/mgoren$ ./bumsort
size stl sort bumsort
100: 0.010467 0.012569
1000: 0.146183 0.102174
10000: 1.9848 1.39455
100000: 59.7368 24.4789
1000000: 739.11 435.598
2000000: 1764.2 1388.3
max@MaxGorenLaptop:/mnt/c/Users/mgoren$

From these tests doing multiple runs of different sizes using random data, the optimized bottom up queue Mergesort is sorting the list over twice as fast as the STL algorithm for a wide range of list sizes, outperforming it on all but the smallest list size.

Final Thoughts

This algorithm is a beautiful example of taking a low level idea and transliterating it into code. It should be kept in mind that like all things, this simplicity comes at a cost: the use of an auxiliary queue in this algorithm means that it does not enjoy the same O(1) memory usage as its recursive top down cousin. Could the nodes of the linked list being sorted serve a double purpose as an implicit queue? Maybe, why don’t you try it out and let me know the results! Regardless, it is still an interesting algorithm, which can also be used – albeit with some modification – to sort arrays as well.

As interesting experiment for the curious: will this algorithm still work if you use a stack instead of queue? If so, is there an effect on it’s performance? Why? Following along the line of thought from above regarding reducing auxiliary memory usage, what optimizations could be applied to the algorithm presented? Are their any existing algorithms that do something similar? (Hint: think python’s listsort)

That’s all for this time, so as always: Happy Hacking!

Further Reading

[1] Knuth, D. “The Art of Computer Programming: Sorting And Searching”, Vol. 3, 1973.

[2] Golin, M.J. and Sedgewick, R. “Queue Mergesort”, Information Processing Letters, Vol 48, 1993. https://sedgewick.io/wp-content/themes/sedgewick/papers/1993Queue.pdf

[3] Munro, J. I. and Wild, S. “Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs”, ESA 2018, https://drops.dagstuhl.de/opus/volltexte/2018/9526/pdf/LIPIcs-ESA-2018-63.pdf

[4] Sedgewick, R. “Algorithms in C++”, (2nd ed.) Ch 12. pp. 168

Leave a Reply

Your email address will not be published. Required fields are marked *