Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quicksort- how pivot-choosing strategies affect the overall Big-oh behavior of quicksort?

I have came up with several strategies, but I am not entirely sure how they affect the overall behavior. I know the average case is O(NlogN), so I would assume that would be in the answer somewhere. I want to just put NlogN+1 for if I just select the 1st item in the array as the the pivot for the quicksort, but I don't know whether that is either correct nor acceptable? If anyone could enlighten me on this subject that would be great. Thanks!

Possible Strategies:

a) Array is random: pick the first item since that is the most cost effective choice.

b) Array is mostly sorted: pick middle item so we are likely to compliment the binary recursion of splitting in half each time.

c) Array is relatively large: pick first, middle and last indexes in array and compare them, picking the smallest to ensure we avoid worst case.

d) Perform 'c' with randomly generated indexes to make selection less deterministic.

like image 757
Mr_CryptoPrime Avatar asked Feb 15 '11 07:02

Mr_CryptoPrime


People also ask

How the selection of pivot affects the performance of quick sort?

Quick sort's complexity varies greatly with the selection of pivot value. for example if you always choose first element as an pivot, algorithm's complexity becomes as worst as O(n^2). here is an smart method to choose pivot element- 1. choose the first, mid, last element of the array.

Why the pivot is important to the Quicksort?

Quicksort is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. For this reason, it is sometimes called partition-exchange sort.

How the choice of pivot element affects the efficiency of algorithm?

Answer: Choice of pivot element affects the efficiency of algorithm: If pivot element is chosen from first or last element of the given array then it results in worst case scenario with O(n2) time complexity.

What are the ways for selecting pivot in terms of quicksort?

You should choose the pivot randomly. Another way is to choose the median value from the first, the last, and the middle element of the array. Say we have 8 1 6 9 6 3 5 2 7 0 And 6 is chosen as the pivot.


1 Answers

An important fact you should know is that in an array of distinct elements, quicksort with a random choice of partition will run in O(n lg n). There are many good proofs of this, and the one on Wikipedia actually has a pretty good discussion of this. If you're willing to go for a slightly less formal proof that's mostly mathematically sound, the intuition goes as follows. Whenever we pick a pivot, let's say that a "good" pivot is a pivot that gives us at least a 75%/25% split; that is, it's greater than at least 25% of the elements and at most 75% of the elements. We want to bound the number of times that we can get a pivot of this sort before the algorithm terminates. Suppose that we get k splits of this sort and consider the size of the largest subproblem generated this way. It has size at most (3/4)kn, since on each iteration we're getting rid of at least a quarter of the elements. If we consider the specific case where k = log3/4 (1/n) = log4/3 n, then the size of the largest subproblem after k good pivots are chosen will be 1, and the recursion will stop. This means that if we choose get O(lg n) good pivots, the recursion will terminate. But on each iteration, what's the chance of getting such a pivot? Well, if we pick the pivot randomly, then there's a 50% chance that it's in the middle 50% of the elements, and so on expectation we'll choose two random pivots before we get a good pivot. Each step of choosing a pivot takes O(n) time, and so we should spend roughly O(n) time before getting each good pivot. Since we get at most O(lg n) good pivots, the overall runtime is O(n lg n) on expectation.

An important detail in the above discussion is that if you replace the 75%/25% split with any constant split - say, a (100 - k%) / k% split - the over asymptotic analysis is the same. You'll get that quicksort takes, on average, O(n lg n) time.

The reason that I've mentioned this proof is that it gives you a good framework for thinking about how to choose a pivot in quicksort. If you can pick a pivot that's pretty close to the middle on each iteartion, you can guarantee O(n lg n) runtime. If you can't guarantee that you'll get a good pivot on any iteration, but can say that on expectation it takes only a constant number of iterations before you get a good pivot, then you can also guarantee O(n lg n) expected runtime.

Given this, let's take a look at your proposed pivot schemes. For (a), if the array is random, picking the first element as the pivot is essentially the same as picking a random pivot at each step, and so by the above analysis you'll get O(n lg n) runtime on expectation. For (b), if you know that the array is mostly sorted, then picking the median is a good strategy. The reason is that if we can say that each element is "pretty close" to where it should be in the sorted sequence, then you can make an argument that every pivot you choose is a good pivot, giving you the O(n lg n) runtime you want. (The term "pretty close" isn't very mathematically precise, but I think you could formalize this without too much difficulty if you wanted to).

As for (c) and (d), of the two, (d) is the only one guaranteed to get O(n lg n) on expectation. If you deterministically pick certain elements to use as pivots, your algorithm will be vulnerable to deterministic sequences that can degenerate it to O(n2) behavior. There's actually a really interesting paper on this called "A Killer Adversary for Quicksort" by McIlroy that describes how you can take any deterministic quicksort and construct a pathologically worst-case input for it by using a malicious comparison function. You almost certainly want to avoid this in any real quicksort implementation, since otherwise malicious users could launch DoS attacks on your code by feeding in these killer sequences to force your program to sort in quadratic time and thus hang. On the other hand, because (d) is picking its sample points randomly, it is not vulnerable to this attack, because on any sequence the choice of pivots is random.

Interestingly, though, for (d), while it doesn't hurt to pick three random elements and take the median, you don't need to do this. The earlier proof is enough to show that you'll get O(n lg n) on expectation with a single random pivot choice. I actually don't know if picking the median of three random values will improve the performance of the quicksort algorithm, though since quicksort is always Ω(n lg n) it certainly won't be asymptotically any better than just picking random elements as the pivots.

I hope that this helps out a bit - I really love the quicksort algorithm and all the design decisions involved in building a good quicksort implementation. :-)

like image 163
templatetypedef Avatar answered Oct 22 '22 02:10

templatetypedef