I have came up with several strategies, but I am not entirely sure how they affect the overall behavior. I know the average case is O(NlogN), so I would assume that would be in the answer somewhere. I want to just put NlogN+1 for if I just select the 1st item in the array as the the pivot for the quicksort, but I don't know whether that is either correct nor acceptable? If anyone could enlighten me on this subject that would be great. Thanks! Possible Strategies: a) Array is random: pick the first item since that is the most cost effective choice. b) Array is mostly sorted: pick middle item so we are likely to compliment the binary recursion of splitting in half each time. c) Array is relatively large: pick first, middle and last indexes in array and compare them, picking the smallest to ensure we avoid worst case. d) Perform 'c' with randomly generated indexes to make selection less deterministic.

An important fact you should know is that in an array of distinct elements, quicksort with a random choice of partition will run in O(n lg n). There are many good proofs of this, and the one on Wikipedia actually has a pretty good discussion of this. If you're willing to go for a slightly less formal proof that's mostly mathematically sound, the intuition goes as follows. Whenever we pick a pivot, let's say that a "good" pivot is a pivot that gives us at least a 75%/25% split; that is, it's greater than at least 25% of the elements and at most 75% of the elements. We want to bound the number of times that we can get a pivot of this sort before the algorithm terminates. Suppose that we get k splits of this sort and consider the size of the largest subproblem generated this way. It has size at most (3/4)kn, since on each iteration we're getting rid of at least a quarter of the elements. If we consider the specific case where k = log3/4 (1/n) = log4/3 n, then the size of the largest subproblem after k good pivots are chosen will be 1, and the recursion will stop. This means that if we choose get O(lg n) good pivots, the recursion will terminate. But on each iteration, what's the chance of getting such a pivot? Well, if we pick the pivot randomly, then there's a 50% chance that it's in the middle 50% of the elements, and so on expectation we'll choose two random pivots before we get a good pivot. Each step of choosing a pivot takes O(n) time, and so we should spend roughly O(n) time before getting each good pivot. Since we get at most O(lg n) good pivots, the overall runtime is O(n lg n) on expectation. An important detail in the above discussion is that if you replace the 75%/25% split with any constant split - say, a (100 - k%) / k% split - the over asymptotic analysis is the same. You'll get that quicksort takes, on average, O(n lg n) time. The reason that I've mentioned this proof is that it gives you a good framework for thinking about how to choose a pivot in quicksort. If you can pick a pivot that's pretty close to the middle on each iteartion, you can guarantee O(n lg n) runtime. If you can't guarantee that you'll get a good pivot on any iteration, but can say that on expectation it takes only a constant number of iterations before you get a good pivot, then you can also guarantee O(n lg n) expected runtime. Given this, let's take a look at your proposed pivot schemes. For (a), if the array is random, picking the first element as the pivot is essentially the same as picking a random pivot at each step, and so by the above analysis you'll get O(n lg n) runtime on expectation. For (b), if you know that the array is mostly sorted, then picking the median is a good strategy. The reason is that if we can say that each element is "pretty close" to where it should be in the sorted sequence, then you can make an argument that every pivot you choose is a good pivot, giving you the O(n lg n) runtime you want. (The term "pretty close" isn't very mathematically precise, but I think you could formalize this without too much difficulty if you wanted to). As for (c) and (d), of the two, (d) is the only one guaranteed to get O(n lg n) on expectation. If you deterministically pick certain elements to use as pivots, your algorithm will be vulnerable to deterministic sequences that can degenerate it to O(n2) behavior. There's actually a really interesting paper on this called "A Killer Adversary for Quicksort" by McIlroy that describes how you can take any deterministic quicksort and construct a pathologically worst-case input for it by using a malicious comparison function. You almost certainly want to avoid this in any real quicksort implementation, since otherwise malicious users could launch DoS attacks on your code by feeding in these killer sequences to force your program to sort in quadratic time and thus hang. On the other hand, because (d) is picking its sample points randomly, it is not vulnerable to this attack, because on any sequence the choice of pivots is random. Interestingly, though, for (d), while it doesn't hurt to pick three random elements and take the median, you don't need to do this. The earlier proof is enough to show that you'll get O(n lg n) on expectation with a single random pivot choice. I actually don't know if picking the median of three random values will improve the performance of the quicksort algorithm, though since quicksort is always Ω(n lg n) it certainly won't be asymptotically any better than just picking random elements as the pivots. I hope that this helps out a bit - I really love the quicksort algorithm and all the design decisions involved in building a good quicksort implementation. :-)

Quicksort- how pivot-choosing strategies affect the overall Big-oh behavior of quicksort?

Tags:

java

performance

algorithm

sorting

quicksort

I have came up with several strategies, but I am not entirely sure how they affect the overall behavior. I know the average case is O(NlogN), so I would assume that would be in the answer somewhere. I want to just put NlogN+1 for if I just select the 1st item in the array as the the pivot for the quicksort, but I don't know whether that is either correct nor acceptable? If anyone could enlighten me on this subject that would be great. Thanks!

Possible Strategies:

a) Array is random: pick the first item since that is the most cost effective choice.

b) Array is mostly sorted: pick middle item so we are likely to compliment the binary recursion of splitting in half each time.

c) Array is relatively large: pick first, middle and last indexes in array and compare them, picking the smallest to ensure we avoid worst case.

d) Perform 'c' with randomly generated indexes to make selection less deterministic.

757

asked Feb 15 '11 07:02

Mr_CryptoPrime

1 Answers

An important fact you should know is that in an array of distinct elements, quicksort with a random choice of partition will run in O(n lg n). There are many good proofs of this, and the one on Wikipedia actually has a pretty good discussion of this. If you're willing to go for a slightly less formal proof that's mostly mathematically sound, the intuition goes as follows. Whenever we pick a pivot, let's say that a "good" pivot is a pivot that gives us at least a 75%/25% split; that is, it's greater than at least 25% of the elements and at most 75% of the elements. We want to bound the number of times that we can get a pivot of this sort before the algorithm terminates. Suppose that we get k splits of this sort and consider the size of the largest subproblem generated this way. It has size at most (3/4)^kn, since on each iteration we're getting rid of at least a quarter of the elements. If we consider the specific case where k = log_3/4 (1/n) = log_4/3 n, then the size of the largest subproblem after k good pivots are chosen will be 1, and the recursion will stop. This means that if we choose get O(lg n) good pivots, the recursion will terminate. But on each iteration, what's the chance of getting such a pivot? Well, if we pick the pivot randomly, then there's a 50% chance that it's in the middle 50% of the elements, and so on expectation we'll choose two random pivots before we get a good pivot. Each step of choosing a pivot takes O(n) time, and so we should spend roughly O(n) time before getting each good pivot. Since we get at most O(lg n) good pivots, the overall runtime is O(n lg n) on expectation.

An important detail in the above discussion is that if you replace the 75%/25% split with any constant split - say, a (100 - k%) / k% split - the over asymptotic analysis is the same. You'll get that quicksort takes, on average, O(n lg n) time.

The reason that I've mentioned this proof is that it gives you a good framework for thinking about how to choose a pivot in quicksort. If you can pick a pivot that's pretty close to the middle on each iteartion, you can guarantee O(n lg n) runtime. If you can't guarantee that you'll get a good pivot on any iteration, but can say that on expectation it takes only a constant number of iterations before you get a good pivot, then you can also guarantee O(n lg n) expected runtime.

Given this, let's take a look at your proposed pivot schemes. For (a), if the array is random, picking the first element as the pivot is essentially the same as picking a random pivot at each step, and so by the above analysis you'll get O(n lg n) runtime on expectation. For (b), if you know that the array is mostly sorted, then picking the median is a good strategy. The reason is that if we can say that each element is "pretty close" to where it should be in the sorted sequence, then you can make an argument that every pivot you choose is a good pivot, giving you the O(n lg n) runtime you want. (The term "pretty close" isn't very mathematically precise, but I think you could formalize this without too much difficulty if you wanted to).

As for (c) and (d), of the two, (d) is the only one guaranteed to get O(n lg n) on expectation. If you deterministically pick certain elements to use as pivots, your algorithm will be vulnerable to deterministic sequences that can degenerate it to O(n²) behavior. There's actually a really interesting paper on this called "A Killer Adversary for Quicksort" by McIlroy that describes how you can take any deterministic quicksort and construct a pathologically worst-case input for it by using a malicious comparison function. You almost certainly want to avoid this in any real quicksort implementation, since otherwise malicious users could launch DoS attacks on your code by feeding in these killer sequences to force your program to sort in quadratic time and thus hang. On the other hand, because (d) is picking its sample points randomly, it is not vulnerable to this attack, because on any sequence the choice of pivots is random.

Interestingly, though, for (d), while it doesn't hurt to pick three random elements and take the median, you don't need to do this. The earlier proof is enough to show that you'll get O(n lg n) on expectation with a single random pivot choice. I actually don't know if picking the median of three random values will improve the performance of the quicksort algorithm, though since quicksort is always Ω(n lg n) it certainly won't be asymptotically any better than just picking random elements as the pivots.

I hope that this helps out a bit - I really love the quicksort algorithm and all the design decisions involved in building a good quicksort implementation. :-)

163

answered Oct 22 '22 02:10

templatetypedef

Related questions
                            
                                Processing large xml files
                            
                                Error while starting Hibernate Search in Web app
                            
                                Java RMI - Making the client a server
                            
                                Check for Viruses before uploading Amazon S3? [closed]
                            
                                Why isn't commons-lang in the java Standard API?
                            
                                PowerMock Mockito: how to mock all static methods?
                            
                                Global Access in Java
                            
                                Is there any way to get a reference to the current frame in Java?
                            
                                Change the component weight dynamically in GridBagLayout
                            
                                overloading with both widening and boxing
                            
                                OSGi loose-coupling best practice
                            
                                Any method for accessing Windows COM methods using Java?
                            
                                Can an Android application or activity call APIs in higher API levels?
                            
                                run tasks at scheduled time java,apache
                            
                                Java: Looking for mutable/re-referenceable weak reference implementation
                            
                                Turn EJB into JSON web service
                            
                                Is there a Java library to access the native Windows API?
                            
                                How to influence search path of System.loadLibrary() through Java code?
                            
                                Inserting a Java string in another string without concatenation? [duplicate]
                            
                                Angular 4.3 HTTPClient Basic Authorization not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With