I know the implementation for most of these algorithms, but I don't know for what sized data sets to use them for (and the data included):
A Sorting Algorithm is used to rearrange a given array or list of elements according to a comparison operator on the elements. The comparison operator is used to decide the new order of elements in the respective data structure.
Some of the best examples of real-world implementation of the same are: Bubble sorting is used in programming TV to sort channels based on audience viewing time! Databases use external merge sort to sort sets of data that are too large to be loaded entirely into memory!
A sorting algorithm will put items in a list into an order, such as alphabetical or numerical order. For example, a list of customer names could be sorted into alphabetical order by surname, or a list of people could be put into numerical order by age.
First of all, you take all the sorting algorithms that have a O(n2)
complexity and throw them away.
Then, you have to study several proprieties of your sorting algorithms and decide whether each one of them will be better suited for the problem you want to solve. The most important are:
Is the algorithm in-place? This means that the sorting algorithm doesn't use any (O(1)
actually) extra memory. This propriety is very important when you are running memory-critical applications.
Bubble-sort, Insertion-sort and Selection-sort use constant memory. There is an in-place variant for Merge-sort too.
Is the algorithm stable? This means that if two elements x
and y
are equal given your comparison method, and in the input x
is found before y
, then in the output x
will be found before y
.
Merge-sort, Bubble-sort and Insertion-sort are stable.
Can the algorithm be parallelized? If the application you are building can make use of parallel computation, you might want to choose parallelizable sorting algorithms.
More info here.
Use Bubble Sort only when the data to be sorted is stored on rotating drum memory. It's optimal for that purpose, but not for random-access memory. These days, that amounts to "don't use Bubble Sort".
Use Insertion Sort or Selection Sort up to some size that you determine by testing it against the other sorts you have available. This usually works out to be around 20-30 items, but YMMV. In particular, when implementing divide-and-conquer sorts like Merge Sort and Quick Sort, you should "break out" to an Insertion sort or a Selection sort when your current block of data is small enough.
Also use Insertion Sort on nearly-sorted data, for example if you somehow know that your data used to be sorted, and hasn't changed very much since.
Use Merge Sort when you need a stable sort (it's also good when sorting linked lists), beware that for arrays it uses significant additional memory.
Generally you don't use "plain" Quick Sort at all, because even with intelligent choice of pivots it still has Omega(n^2)
worst case but unlike Insertion Sort it doesn't have any useful best cases. The "killer" cases can be constructed systematically, so if you're sorting "untrusted" data then some user could deliberately kill your performance, and anyway there might be some domain-specific reason why your data approximates to killer cases. If you choose random pivots then the probability of killer cases is negligible, so that's an option, but the usual approach is "IntroSort" - a QuickSort that detects bad cases and switches to HeapSort.
Radix Sort is a bit of an oddball. It's difficult to find common problems for which it is best, but it has good asymptotic limit for fixed-width data (O(n)
, where comparison sorts are Omega(n log n)
). If your data is fixed-width, and the input is larger than the number of possible values (for example, more than 4 billion 32-bit integers) then there starts to be a chance that some variety of radix sort will perform well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With