Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pre-sorting analysis algorithm?

It's a well-known isssue with Quicksort that when the data set is in or almost in sort order, performance degrades horribly. In this case, Insertion Sort, which is normally very slow, is easily the best choice. The question is knowing when to use which.

Is there an algorithm available to run through a data set, apply a comparison factor, and return a report on how close the data set is to being in sort order? I prefer Delphi/Pascal, but I can read other languages if the example isn't overly complex.

like image 261
Mason Wheeler Avatar asked Dec 04 '09 19:12

Mason Wheeler


People also ask

What is pre sorting algorithm?

Presorting is a form of preconditioning. Preconditioning is manipulating the data to make the algorithm faster. Another example is the problem to determine the uniqueness of array elements. The brute force algorithm would compare each array element with the rest of the array.

Which algorithm is used for sorting?

While there are a large number of sorting algorithms, in practical implementations a few algorithms predominate. Insertion sort is widely used for small data sets, while for large data sets an asymptotically efficient sort is used, primarily heapsort, merge sort, or quicksort.

What are the 5 Classification of sorting?

Some adaptive sorting algorithms are : Bubble Sort, Insertion Sort and Quick Sort. On the other hand some non-adaptive sorting algorithms are : Selection Sort, Merge Sort, and Heap Sort. Internal Sorting : Sorting algorithms that use main memory exclusively during the sort are called internal sorting algorithms.


2 Answers

As you'd expect quite a lot of thought goes into this. The median-of-three technique means that quicksort's worst case behaviour doesn't occur for sorted data, but instead for less obvious cases.

Introsort is quite exciting, since it avoids quicksort's quadratic worst case altogether. Instead of your natural question, "how do I detect that the data is nearly-sorted", it in effect asks itself as it's going along, "is this taking too long?". If the answer is yes, it switches from quicksort to heapsort.

Timsort combines merge sort with insertion sort, and performs very well on sorted or reverse-sorted data, and on data that includes sorted or reverse-sorted subsets.

So probably the answer to your question is, "you don't need a pre-pass analysis, you need an adaptive sort algorithm".

like image 92
Steve Jessop Avatar answered Oct 19 '22 06:10

Steve Jessop


There's also SmoothSort, which is apparently quite tricky to implement, but it varies between O(N log N) to O(N) depending on how sorted the data is to start with.

http://en.wikipedia.org/wiki/Smoothsort

Long tricky PDF: http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF

However, if your data is truly huge and you have to access it serially, mergesort is probably the best. It's always O(N log N) and it has excellent 'locality' properties.

like image 43
wowest Avatar answered Oct 19 '22 07:10

wowest