It's a well-known isssue with Quicksort that when the data set is in or almost in sort order, performance degrades horribly. In this case, Insertion Sort, which is normally very slow, is easily the best choice. The question is knowing when to use which. Is there an algorithm available to run through a data set, apply a comparison factor, and return a report on how close the data set is to being in sort order? I prefer Delphi/Pascal, but I can read other languages if the example isn't overly complex.

There's also SmoothSort, which is apparently quite tricky to implement, but it varies between O(N log N) to O(N) depending on how sorted the data is to start with. http://en.wikipedia.org/wiki/Smoothsort Long tricky PDF: http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF However, if your data is truly huge and you have to access it serially, mergesort is probably the best. It's always O(N log N) and it has excellent 'locality' properties.

Pre-sorting analysis algorithm?

Tags:

algorithm

sorting

delphi

analysis

It's a well-known isssue with Quicksort that when the data set is in or almost in sort order, performance degrades horribly. In this case, Insertion Sort, which is normally very slow, is easily the best choice. The question is knowing when to use which.

Is there an algorithm available to run through a data set, apply a comparison factor, and return a report on how close the data set is to being in sort order? I prefer Delphi/Pascal, but I can read other languages if the example isn't overly complex.

261

asked Dec 04 '09 19:12

Mason Wheeler

2 Answers

As you'd expect quite a lot of thought goes into this. The median-of-three technique means that quicksort's worst case behaviour doesn't occur for sorted data, but instead for less obvious cases.

Introsort is quite exciting, since it avoids quicksort's quadratic worst case altogether. Instead of your natural question, "how do I detect that the data is nearly-sorted", it in effect asks itself as it's going along, "is this taking too long?". If the answer is yes, it switches from quicksort to heapsort.

Timsort combines merge sort with insertion sort, and performs very well on sorted or reverse-sorted data, and on data that includes sorted or reverse-sorted subsets.

So probably the answer to your question is, "you don't need a pre-pass analysis, you need an adaptive sort algorithm".

answered Oct 19 '22 06:10

Steve Jessop

There's also SmoothSort, which is apparently quite tricky to implement, but it varies between O(N log N) to O(N) depending on how sorted the data is to start with.

http://en.wikipedia.org/wiki/Smoothsort

Long tricky PDF: http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF

However, if your data is truly huge and you have to access it serially, mergesort is probably the best. It's always O(N log N) and it has excellent 'locality' properties.

answered Oct 19 '22 07:10

wowest

Related questions
                            
                                Determine conflict-free sets?
                            
                                Enumerate all partial orders
                            
                                Is it possible to find the list of attributes which would yield to the greatest sum without brute forcing?
                            
                                Natural Language Processing for Smart Homes
                            
                                To check wether it's a complete binary tree or fully binary tree or neither of the two
                            
                                How to find the period of a string
                            
                                Dynamic (i.e. variable size) Fenwick Tree?
                            
                                transferring an imperative for-loop into idiomatic haskell
                            
                                How to brute force arithmetic puzzle?
                            
                                Why must back-edges be taken into account in Edmonds-Karp Maximum Flow?
                            
                                Optimal filling of grid figure with squares
                            
                                XOR on contiguous subarrays of an array
                            
                                HashMap Space Complexity
                            
                                Codility MinAbsSum
                            
                                Disjoint set implementation in Python
                            
                                CSES Range Query Question: Salary Queries
                            
                                Find the Minimum Positive Value
                            
                                How to cluster objects (without coordinates)
                            
                                Looking for collective intelligence .Net / C# resources [closed]
                            
                                Algorithms to find stuff a user would like based on other users likes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With