I have an array of elements. This array could be:
But I do not know (in advance) which of these cases applies. I would prefer to sort the array into the order which it is already close to.
It does not matter whether the output is ascending or descending, but it must be one or the other (so I can perform a binary search on it.)
The sort need not be stable.
Some background info: The process goes roughly like this:
A and B are often correlated with each other (but may be positively or negatively.) Same applies to B and C. Occasionally A == C.
* "nearly sorted" here means most elements are close to their final positions. But rarely exactly at their final positions (there is a lot of additive noise, and not many long sorted subsequences.) Still, there are usually a few "outliers" at the start and end of the array which are poor predictors of the order for the next sort.
Is there an algorithm that can advantage of the fact that I have no preference for ascending vs. descending, to sort more cheaply (compared to the TimSort I am currently using?)
Descending can also be thought of as climbing down the stairs of numbers starting from the highest value. Moving down the slide is descending. The opposite of descending order is known as ascending order, in which the numbers are arranged from lower value to higher value.
Selection Sort This sorting algorithm sorts an array by repeatedly finding the minimum element (considering ascending order) from the unsorted part and putting it at the beginning.
Among the classical sorting algorithms, heap sort will do well when the input turns out to be (almost) sorted in descending order, as then the max-heap construction phase will involve (almost) no swaps (while the most swaps would occur when the input was already sorted in ascending order).
I'd continue using Timsort (however, a good alternative is Smoothsort*), but first probe the array to decide whether to sort in ascending or descending order. Look at the first and last elements and sort accordingly. If the array is unsorted, the choice is immaterial; if it is (partially) sorted, probing at a wide interval is more likely to correctly detect which way.
*Smoothsort has the same best, average, and worst case time as Timsort, and better space complexity. Like Timsort, it was specifically designed to take advantage of partially sorted data.
Another possibility to consider:
The "small fixed number" can be a number for which insertion sort is fairly fast even in bad cases. I would guess 10-20 or so. It's possible to work out the probability of a false positive in uniformly shuffled data for any given number of insertions and any given threshold of "close to 0/1", but I'm too lazy.
You say the first and last few array elements typically buck the trend, in which case you could exclude them from the initial test insertion sort.
Obviously this approach is somewhat inspired by Timsort. But Timsort is fiendishly optimized for data that contains runs -- I have tried to fiendishly optimize only for data that's close to one big run (in either direction). Another feature of Timsort is that it's well tested, I don't claim to share that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With