Research: Sorting Algorithm Shootout

The Sorting Algorithm Shootout

In studying algorithm analysis you come across common tropes and perpetuated claims that most people take at face value: worst case, average case, best case complexities to name a few. Tweak X algorithm by Doing Y to yield Z% performance increase. Algorithm M will outperform algorithm N because XYZ. But how do they REALLY fare when put head to head? I was curious about this myself and started out to test a sampling of sorting algorithms and collect some data. What i ended up with was nine sorting algorithms, two formalized tests, a profiling algorithm, and some results that surprised the heck out of me.

The Algorithms

All algorithms were tasked with sorting randomly generated integers stored in an std::vector.

Standard Exponential time Algorithms:

Exponential time Algorithms with optimization

  • Selection Sort+ – Both the Least AND Greatest values obtained in each pass so the array gets sorted at both ends towards the middle.
  • Bubble Sort 2 – Values from opposite ends are compared instead of neighboring values. (i  < N- i) where N is the number of elements in the vector.

O(n log n) Algorithms

The Test Platform

The results shown here were from tests conducted on an Apple MacBook Pro with an intel i7 2.7ghz and 8gb of ram, running macOS high sierra 10.13.6.

The Tests

I devised a series of tests comprised of different size vectors of random values,

  1. Test one: each algorithm sorts 10 vectors of 100 randomly generated integers between INT_MIN and INT_MAX. The average time of these ten runs is calculated and each algorithm is ranked based on this Average.
  2. Test two: each algorithm sorts 5 vectors of 3000 randomly generated integers between INT_MIN and INT_MAX. For this test all 9 algorithms recieve the same data set, and this data set is sorted 5 times, and the algorithms are ranked based on the average time it takes to sort the data.
  3. Test three: Each algorithm sorts 5 vectors of 1000 randomly generated integers within a very limited range (25 – 75). This is to simulate data sets which are “mostly sorted”. Once again the Algorithms are ranked based on the average time it takes to sort the data set. a new random vector is generated for each run.
  4. Test four: Each algorithm sorts 10 vectors of 5000 randomly generated integers between INT_MIN and INT_MAX. The algorithms are ranked based on the average time it takes to sort the data set. a new vector is generated for Each run.
  5. Test five: The Big data set. Each Algorithm sorts 10 vectors of 90,000 integers ranging between 1, and 5000. The algorithms are ranked in the same manner as test four.

The Results

Generated Output:

---------------------------------------------------------------
Test one: Average of 10 runs, Each run performed on new random data set.
Sample size: 100 items of type: i
Rank    Algorithm    Time To Execute 
---------------------------------------
1:  Insertion sort - 0.00829ms
2:      Shell sort - 0.0144ms
3:    Bubble sort2 - 0.0646ms
4:  Selection sort - 0.0801ms
5:     Bubble sort - 0.0806ms
6: Selection sort+ - 0.0998ms
7:        Heapsort - 0.196ms
8:      Merge sort - 0.362ms
9:       Quicksort - 2.57ms
---------------------------------------------------------------
Test two: Average of 5 runs, All algorithms recieve same data set.
Sample size: 3000 items of type: i
Rank    Algorithm    Time To Execute 
---------------------------------------
1:      Shell sort - 0.561ms
2:        Heapsort - 2.9ms
3:  Insertion sort - 5.05ms
4:      Merge sort - 6.81ms
5:       Quicksort - 16.9ms
6:    Bubble sort2 - 37.5ms
7:  Selection sort - 42.3ms
8:     Bubble sort - 44ms
9: Selection sort+ - 44.8ms
---------------------------------------------------------------
Test three: Average of 5 runs, All algorithms recieve unique, limited varying data set.
Sample size: 3000 items of type: i
Rank    Algorithm    Time To Execute 
---------------------------------------
1:      Shell sort - 0.131ms
2:  Insertion sort - 0.463ms
3:       Quicksort - 0.698ms
4:        Heapsort - 1.58ms
5:      Merge sort - 1.97ms
6:    Bubble sort2 - 2.99ms
7:  Selection sort - 3.78ms
8:     Bubble sort - 4.31ms
9: Selection sort+ - 4.97ms
---------------------------------------------------------------
Test Four: Average of 10 runs, Each run performed on new random data set.
Sample size: 5000 items of type: i
Rank    Algorithm    Time To Execute 
---------------------------------------
1:      Shell sort - 1.68ms
2:        Heapsort - 10.2ms
3:  Insertion sort - 13.9ms
4:      Merge sort - 30.3ms
5:       Quicksort - 100ms
6:    Bubble sort2 - 187ms
7:  Selection sort - 219ms
8:     Bubble sort - 233ms
9: Selection sort+ - 280ms
Test Five: Average of 5 runs, All algorithms receive unique, somewhat limited varying data set.
Sample size: 90000 items of type: i
Rank    Algorithm    Time To Execute 
---------------------------------------
1:      Shell sort - 24.6ms
2:        Heapsort - 122ms
3:      Merge sort - 241ms
4:       Quicksort - 2.78e+03ms
5:  Insertion sort - 4.15e+03ms
6:    Bubble sort2 - 2.72e+04ms
7:  Selection sort - 3.12e+04ms
8:     Bubble sort - 3.37e+04ms
9: Selection sort+ - 3.86e+04ms

 

What was learned & possible improvements

As you can see the ranking changes greatly based on the size and composition of the vector – this was expected. What was NOT expected and REALLY surprised me was just how well Shellsort fared. In essence, it ruled the day, Coming in first in all but the small data set test where it was very narrowly edged out by Insertion Sort.

Another surprise was how much the modified Bubblesort algorithm increased performance over the original with such a minimal design change. Performance increased to the point where it consistently outperformed selection sort, which came as an additional surprise.

The expected gains to selection sort’s running time were not realized. In fact the supposed “optimization” actually led to a DECREASE in performance – testament to the value of profiling. I chalk this up to the exponential rise in comparisons being conducted, as well as the doubling of swaps performed. The only performant rise was in the number of iterations in the outer loop, which were cut in half. This gain however did not come even close to mitigating the costly swap additions, or the increase in comparisons.

Of the purported O(n log n) sorting algorithms Heap sort was the clear winner with Merge Sort and Quicksort alternating rankings based on the make up of the vector.

Merge sort didn’t come into its own until the number of items being sorted grew very large, however this was expected. Even in tests where merge sort shines, it still couldn’t muster up the power to outperform Heap Sort or the surprise winner of the day, Shellsort.

In the large data tests where the divide and conquer algorithms were expected to reign supreme, the fact that with a vector of 90,000 items Shellsort still performed almost five times faster than the next runner up, heap sort, and a whopping TEN TIMES faster than the number 3 ranking merge sort is just phenomenal.

Considering i used a generic gapping scheme and not a “tuned gapping sequence” like Knuth talks about in The Art Of Computer Programming speaks to the versatility of this algorithm. Infact, it performed so well that i was inclined to go back multiple times and check that it was ACTUALLY sorting the data.

The shellSort heard around the world.

 

 

As good as Shellsort performed, Quicksort performed rather poorly. This came as a surprise because Quicksort is exhorted over and over in the literature as “the best of the bunch”. And while the quicksort algorithm i used was the basic recursive implementation with an untuned pivot selection, it still should have faired better than it did, rarely peeking above its worst case O(n2) performance. Optimizing quicksort to use a stack based iterative approach and using the median-of-medians pivot algorithm should yield better performance – its as advertised O(nlogn), though we saw just how much an “optimization” improved Selection Sort!

One of the big take aways in from this study which should not be forgotten was thus: when it comes to modifying a tried and true algorithm: buyer beware, alot of research when it to the design of the originals.

Honorable Mention: Insertion Sort. True to form, Insertion sort consistantly scored better than the other “Elementary Sorts” (bubble sort and selection sort) as well as their alleged optimized forms.

Consistant with the literature was that Insertion sort out performed Selection sort which outperformed Bubble Sort.

Though not shown here, I should take time out to mention that std::sort() also performed EXTREMELY well, so chalk one up to Introsort. When you need to sort data, the standard library pretty much has you beat by a large margin compared to anything you can easily (and in some cases, not so easily) hand roll. 

If your interested in seeing the implementation of the above algorithms, as well as how i profiled them, the full source code from this example is available on my github:

https://github.com/maxgoren/Algorithms/blob/main/Sorting/C%2B%2B/shootout.cpp

Leave a Reply

Your email address will not be published. Required fields are marked *