Sunday, January 26, 2014

Battle of sorting algorithms (Which sorting algorithm is the best?)

I was recently going through MIT Opencourseware's Algorithms course by professor Erik Demaine and Charles Leiserson (fantastic teaching by the way!). After finishing up the lectures on sorting and worse case linear order statistics, an innocent question to ask is "Which sorting algorithm is best?". Mind you, the lectures are exhaustive and walk you through the running time of almost all discussed algorithms. But there are constraints and involvement of constants which hinder the practical applicability of these algorithms. In this post, i wish to delineate the algorithm selection mystery assuming you are sorting numbers. (You can extend the below mentioned analysis for floating points, strings and so on).

Algorithms based on comparing a pair of numbers and minimizing the number of comparisons have a tight lower bound of (n log n). This means that nlogn is the worst case runtime of the best comparison based sorting algorithm. And the best algorithm, in practice, is randomized quick sort.The advantage of randomized quick sort is that it avoids the worst case of quick sort (which might take upto O(n^2) time for selected inputs such as sorted or a reverse sorted array).

So, is randomized quick sort the answer to all problems?

As the lecture demonstrates, we can sort in linear time too. With algorithms such as counting sort and radix sort, it is quite possible to get a upper bound of O(n), where n is the number of elements to be sorted. These algorithms do not fall under the comparison based model.

So, are counting sort and radix sort the answer to all problems?

Not quite. Each of them is useful in their own way. For example, after going through the rigorous analysis presented in the lecture, i've come up with a criteria of when to use each of them:

1) Counting sort (Stable): Use it when

     i) You know the maximum number in the elements to be sorted.
     ii) All numbers are integers
     iii) You can afford to allocate two auxiliary storages of size equal to the number of elements.
     iv) The numbers themselves are not very large (perhaps less than 10,000). This limit can be increased if you can write a multi-threaded version of counting sort where you sort the numbers in parallel.

2) Radix sort (MSD implementations can be stable): Use it when

     i)  You know the range of numbers to be sorted
     ii) Elements to be sorted consists of only integers, fixed size strings or anything which can be converted to a fixed size integer.
     iii) The number of digits in each integer is less than log n where base of the log is the base of the number to be sorted. For example, if we desire to sort 10^6 integers which are all in base 10, then we'll check if log 10^6 base 10 is less than the number of digits in each integer. If yes, we'll go ahead with the sort. The logic of this check is explained in the lecture found here.

3) Randomized quick sort: Use it when

     i) When 1) and 2) cannot be applied.
     ii) You want to your algorithm to work well with cache.
Technorati Tags: MIT, algorithms, sorting, counting-sort, radix-sort, best-sorting-algorithm, quick-sort, randomized-quick-sort

No comments:

Post a Comment