qsort(): The C library function with an unfortunate name
qsort()
At a glance most would assume such a function would invoke the quicksort algorithm, and they would be right - most of the time. Indeed MANY different C standard libraries bull ahead and use quicksort with varying amount of optimizations and call it a day. But an interesting note in the annals of history is that the C standard makes no mention of which algorithm qsort should implement, only HOW it's API should perform. This means in some interesting cases, library authors have done away with the quicksort algorithm all together and ventured down the road less travelled.Standards vs. Conventions
As mentioned above, the C standard does not specify the algorithm to be used for its sorting routine. Many blithely assume that the "q" in qsort() implies "quicksort". It doesn't. The "q" in qsort() originally was a shortening of "quick-er sort" implying that it was an algorithm that peformed better than your everyday O(n2) sorting routine. The folks over at calmerthanyouare.org did an analysis of different library qsort()'s in 2013 and what they uncovered was enlightening.When qsort() isn't quicksort
Several C standard libraries use algorithms other than quicksort and still perform very well. Most of these implementations are meant to be used on embeded devices where system resources are rather limited. But not all. glibc, arguably the most heavily used C standard library on linux systems takes an interesting approach: it defaults to mergesort, but can fall back to quicksort if the requisite system memory is not available for mergesorts O(n) space complexity. klibc the stripped down c library used by the linux kerenel implements comb sort, a bizarre choice indeed, being that comb sort is heavily modified versions of... bubble sort! I'm sure the developers had their reasons... musl libc really goes the distance. The developers decided that their qsort() would implement a sorting algorithm that doesn't get much attention: smooth sort. Invented by the venerable Edsger Dijkstra, smooth sort is like heapsort on steroids and not the easiest of algorithms to wrap your head around. A good write up on the what and how of smooth sort is available at https://www.keithschwarz.com/smoothsort/ uClibc goes oldschool, choosing to use shellsort with knuths 3h + 1 gap sequence. While at first glance it might seem odd that a library would use shellsort for its qsort() function, in this case it makes sense. While not written in stone, it is generally accepted that shellsort has a time complexity of O(n4/3) and in sedgewicks "Algorithms" he statesShellsort is the method of choice for many sorting applications because it has acceptable running time even for moderately large files ... and requires only a very small amount of code that is easy to get running. - Robert Sedgewick "Algorithms in C++" pg. 111
When qsort() IS quicksort
We've covered that qsort() is not always quicksort, but thats the exception not the rule. The large majority of C library implementations DO use quicksort. While perusing the various libc's available it becomes clear that their are certain optimizations that are practically universally implemented:- Small subfile's fall back to insertion sort - This optimization is estimated to yield up to a 20% increase in sorting speed, thats no small feat.
- Median of three pivot selection - this scheme for choosing the pivot item for that partition phase goes a long way to helping avoid quicksort from devolving to its O(n2) worst case time complexity. While it doesn't eliminate the possibility, it does make the chance very small. Some implementations go even farther, using a median of medians(called a "9ther") for subfiles above a certain size.
- Using iteration with a stack instead of recursion - regardless of if you use iteration or recursion quicksort needs a stack. By using iteration with an explicit stack you can limit the required stack space to lgN by always handling the larger array first. Not only is this a decrease in space complexity, but it also aids in preventing possible crashes from running out of stack space which can happen on very large arrays using recursion.
Wrapping up
Sometimes you just need a quick and dirty sort, and using your standard libraries default sorting algorithm makes sense. But in a serious sorting application when every ounce of performance must be obtained the only way to proceed is to implement your own. The benefits of doing this far out weigh the perceived difficulties. When you implement your programs sort function yourself you can fully take advantage of any pre-sorted properties of the data that you know of in advance. You can also tweak thresholds for insertion sort, median of three parition sizes, etc to really get the absolute best performance possible. But sometimes you just need a data set sorted simply, and in those cases C's qsort() is worth a look.Search
Recent Posts
-
Simple DB Migration with JDBC
-
Welcome to CodeBlahger, A Blahging Platform for Programmers.
-
Design Patterns: The Façade Pattern
-
The Interpreter Pattern: Implementing Interpreters the OOP way
-
Parsing Right-Associative Operators with Recursive Descent
-
BST Iterators Revisited: No Parent Pointer, No Stack, No Problem
-
Deleting Arbitrary Values from Binary Search Trees
-
Solving the N Queens Problem with Breadth First Search
-
Performing the Knights Tour in Linear Time
-
Knuth's Algorithm X For the Exact Cover Problem
Meta
Leave A Comment