Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which sort algorithm works best on mostly sorted data? [closed]

Based on the highly scientific method of watching animated gifs I would say Insertion and Bubble sorts are good candidates.


Only a few items => INSERTION SORT

Items are mostly sorted already => INSERTION SORT

Concerned about worst-case scenarios => HEAP SORT

Interested in a good average-case result => QUICKSORT

Items are drawn from a dense universe => BUCKET SORT

Desire to write as little code as possible => INSERTION SORT


timsort

Timsort is "an adaptive, stable, natural mergesort" with "supernatural performance on many kinds of partially ordered arrays (less than lg(N!) comparisons needed, and as few as N-1)". Python's built-in sort() has used this algorithm for some time, apparently with good results. It's specifically designed to detect and take advantage of partially sorted subsequences in the input, which often occur in real datasets. It is often the case in the real world that comparisons are much more expensive than swapping items in a list, since one typically just swaps pointers, which very often makes timsort an excellent choice. However, if you know that your comparisons are always very cheap (writing a toy program to sort 32-bit integers, for instance), other algorithms exist that are likely to perform better. The easiest way to take advantage of timsort is of course to use Python, but since Python is open source you might also be able to borrow the code. Alternately, the description above contains more than enough detail to write your own implementation.


Insertion sort with the following behavior:

  1. For each element k in slots 1..n, first check whether el[k] >= el[k-1]. If so, go to next element. (Obviously skip the first element.)
  2. If not, use binary-search in elements 1..k-1 to determine the insertion location, then scoot the elements over. (You might do this only if k>T where T is some threshold value; with small k this is overkill.)

This method makes the least number of comparisons.