Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Near Sorting Algorithms - When to use?

From time to time I browse the web and look for interesting algorithms and datastructures to put into my bag of tricks. A year ago I came across the Soft Heap data-structure and learned about near sorting.

The idea behind this is that it's possible to break the O(n log n) barrier of compare based sorts if you can live with the fact that the sort algorithm cheats a bit. You get a almost sorted list but you have to live with some errors as well.

I played around with the algorithms in a test environement but never found a use for them.

So the question: Has anyone ever used near sorting in practice? If so in which kind of applications? Can you think up a use-case where near sorting is the right thing to do?

like image 372
Nils Pipenbrinck Avatar asked Sep 28 '08 15:09

Nils Pipenbrinck


People also ask

When should we use which sorting algorithm?

Quicksort is probably more effective for datasets that fit in memory. For larger data sets it proves to be inefficient so algorithms like merge sort are preferred in that case. Quick Sort in is an in-place sort (i.e. it doesn't require any extra storage) so it is appropriate to use it for arrays.

Which sorting algorithm is best when data is almost sorted?

​Many sorting algorithms are available, but the one which is best suited for the almost sorted array is the insertion sort.

Which sorting algorithm is best suitable for low memory system?

Among the sorting algorithms that we generally study in our data structure and algorithm courses, Selection Sort makes least number of writes (it makes O(n) swaps). But, Cycle Sort almost always makes less number of writes compared to Selection Sort.

Which sorting algorithm is best if it is already sorted & Why?

Insertion sort runs much more efficiently if the array is already sorted or "close to sorted." Selection sort always performs O(n) swaps, while insertion sort performs O(n2) swaps in the average and worst case. Selection sort is preferable if writing to memory is significantly more expensive than reading.


1 Answers

This is a total flying guess, but given the inherent subjectivity of "relevance" measures when sorting search results, I'd venture that it doesn't really matter whether or not they're perfectly sorted. The same could be said for recommendations. If you can somehow arrange that every other part of your algorithm for those things is O(n) then you might look to avoid a sort.

Be aware also that in the worst case your "nearly sorted" data does not meet one possible intuitive idea of "nearly sorted", which is that it has only a small number of inversions. The reason for this is just that if your data has only O(n) inversions, then you can finish sorting it in O(n) time using insertion sort or cocktail sort (i.e. two-way bubble sort). It follows that you cannot possibly have reached this point from completely unsorted, in O(n) time (using comparisons). So you're looking for applications where a majority subset of the data is sorted and the remainder is scattered, not for applications requiring that every element is close to its correct position.

like image 154
Steve Jessop Avatar answered Oct 03 '22 12:10

Steve Jessop