From time to time I browse the web and look for interesting algorithms and datastructures to put into my bag of tricks. A year ago I came across the Soft Heap data-structure and learned about near sorting.
The idea behind this is that it's possible to break the O(n log n) barrier of compare based sorts if you can live with the fact that the sort algorithm cheats a bit. You get a almost sorted list but you have to live with some errors as well.
I played around with the algorithms in a test environement but never found a use for them.
So the question: Has anyone ever used near sorting in practice? If so in which kind of applications? Can you think up a use-case where near sorting is the right thing to do?
Quicksort is probably more effective for datasets that fit in memory. For larger data sets it proves to be inefficient so algorithms like merge sort are preferred in that case. Quick Sort in is an in-place sort (i.e. it doesn't require any extra storage) so it is appropriate to use it for arrays.
Many sorting algorithms are available, but the one which is best suited for the almost sorted array is the insertion sort.
Among the sorting algorithms that we generally study in our data structure and algorithm courses, Selection Sort makes least number of writes (it makes O(n) swaps). But, Cycle Sort almost always makes less number of writes compared to Selection Sort.
Insertion sort runs much more efficiently if the array is already sorted or "close to sorted." Selection sort always performs O(n) swaps, while insertion sort performs O(n2) swaps in the average and worst case. Selection sort is preferable if writing to memory is significantly more expensive than reading.
This is a total flying guess, but given the inherent subjectivity of "relevance" measures when sorting search results, I'd venture that it doesn't really matter whether or not they're perfectly sorted. The same could be said for recommendations. If you can somehow arrange that every other part of your algorithm for those things is O(n) then you might look to avoid a sort.
Be aware also that in the worst case your "nearly sorted" data does not meet one possible intuitive idea of "nearly sorted", which is that it has only a small number of inversions. The reason for this is just that if your data has only O(n) inversions, then you can finish sorting it in O(n) time using insertion sort or cocktail sort (i.e. two-way bubble sort). It follows that you cannot possibly have reached this point from completely unsorted, in O(n) time (using comparisons). So you're looking for applications where a majority subset of the data is sorted and the remainder is scattered, not for applications requiring that every element is close to its correct position.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With