What is the median of three strategy to select the pivot value in quick sort? I am reading it on the web, but I couldn't figure it out what exactly it is? And also how it is better than the randomized quick sort.

Think faster... C example.... <pre class="prettyprint"><code>int medianThree(int a, int b, int c) { if ((a > b) ^ (a > c)) return a; else if ((b < a) ^ (b < c)) return b; else return c; } </code></pre> This uses bitwise <code>XOR</code> operator. So you would read: <ul> <li>Is <code>a</code> greater than exclusively one of the others? <code>return a</code> </li> <li>Is <code>b</code> smaller than exclusively one of the others? <code>return b</code> </li> <li>If none of above: <code>return c</code> </li> </ul> Note that by switching the comparison for <code>b</code> the method also covers all cases where some inputs are equal. Also that way we repeat the same comparison <code>a > b</code> is the same as <code>b < a</code>, smart compilers can reuse and optimize that. The median approach is faster because it would lead to more evenly partitioning in array, since the partitioning is based on the pivot value. In the worst case scenario with a random pick or a fixed pick you would partition every array into an array containing just the pivot and another array with the rest, leading to an O(n²) complexity. Using the median approach you make sure that won't happen, but instead you are introducing an overhead for calculating the median. <h3>EDIT:</h3> Benchmarks results show <code>XOR</code> is 32 times faster than <code>Bigger</code> even though I optimized Bigger a little: <img src="https://i.imgur.com/qBJFfeN.png" alt="Plot demonstrating benchmarks"> You need to recall that <code>XOR</code> is actually a very basic operator of the CPU's Arithmetic Logic Unit (ALU), then although in C it may seem a bit hacky, under the hood it is compiling to the very efficient <code>XOR</code> assembly operator.

median of three values strategy

2 Answers

The median of three has you look at the first, middle and last elements of the array, and choose the median of those three elements as the pivot.

To get the "full effect" of the median of three, it's also important to sort those three items, not just use the median as the pivot -- this doesn't affect what's chosen as the pivot in the current iteration, but can/will affect what's used as the pivot in the next recursive call, which helps to limit the bad behavior for a few initial orderings (one that turns out to be particularly bad in many cases is an array that's sorted, except for having the smallest element at the high end of the array (or largest element at the low end). For example:

Compared to picking the pivot randomly:

It ensures that one common case (fully sorted data) remains optimal.
It's more difficult to manipulate into giving the worst case.
A PRNG is often relatively slow.

That second point probably bears a bit more explanation. If you used the obvious (rand()) random number generator, it's fairly easy (for many cases, anyway) for somebody to arrange the elements so it'll continually choose poor pivots. This can be a serious concern for something like a web server that may be sorting data that's been entered by a potential attacker, who could mount a DoS attack by getting your server to waste a lot of time sorting the data. In a case like this, you could use a truly random seed, or you could include your own PRNG instead of using rand() -- or you use use Median of three, which also has the other advantages mentioned.

On the other hand, if you use a sufficiently random generator (e.g., a hardware generator or encryption in counter mode) it's probably more difficult to force a bad case than it is for a median of three selection. At the same time, achieving that level of randomness typically has quite a bit of overhead of its own, so unless you really expect to be attacked in this case, it's probably not worthwhile (and if you do, it's probably worth at least considering an alternative that guarantees O(N log N) worst case, such as a merge sort or heap sort.

answered Oct 02 '22 11:10

Jerry Coffin

Think faster... C example....

int medianThree(int a, int b, int c) {     if ((a > b) ^ (a > c))          return a;     else if ((b < a) ^ (b < c))          return b;     else         return c; }

This uses bitwise XOR operator. So you would read:

Is a greater than exclusively one of the others? return a
Is b smaller than exclusively one of the others? return b
If none of above: return c

Note that by switching the comparison for b the method also covers all cases where some inputs are equal. Also that way we repeat the same comparison a > b is the same as b < a, smart compilers can reuse and optimize that.

The median approach is faster because it would lead to more evenly partitioning in array, since the partitioning is based on the pivot value.

In the worst case scenario with a random pick or a fixed pick you would partition every array into an array containing just the pivot and another array with the rest, leading to an O(n²) complexity.

Using the median approach you make sure that won't happen, but instead you are introducing an overhead for calculating the median.

EDIT:

Benchmarks results show XOR is 32 times faster than Bigger even though I optimized Bigger a little:

Plot demonstrating benchmarks

You need to recall that XOR is actually a very basic operator of the CPU's Arithmetic Logic Unit (ALU), then although in C it may seem a bit hacky, under the hood it is compiling to the very efficient XOR assembly operator.

answered Oct 02 '22 13:10

caiohamamura

Related questions
                            
                                Longest increasing subsequence
                            
                                Loop invariant of linear search
                            
                                String analysis
                            
                                Most efficient way to escape XML/HTML in C++ string?
                            
                                How many hash functions are required in a minhash algorithm
                            
                                Fastest algorithm for primality test [closed]
                            
                                Quick sort Worst case
                            
                                Reservoir sampling
                            
                                Python implementation of the Wilson Score Interval?
                            
                                LRU cache implementation in Javascript
                            
                                A Cache Efficient Matrix Transpose Program?
                            
                                Extend a line segment a specific distance
                            
                                When merge sort is preferred over Quick sort?
                            
                                Determining the complexities given codes
                            
                                What is the difference between O, Ω, and Θ?
                            
                                find pair of numbers in array that add to given sum
                            
                                Making Fibonacci faster [duplicate]
                            
                                Printing BFS (Binary Tree) in Level Order with Specific Formatting
                            
                                how to efficiently get the k bigger elements of a list in python
                            
                                Best word wrap algorithm? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

median of three values strategy

Tags:

algorithm

sorting

quicksort

Abdul Samad

People also ask

2 Answers

Jerry Coffin

EDIT:

caiohamamura

Recent Activity

Donate For Us