Intuitive explanation for why QuickSort is n log n?

Tags:

Is anybody able to give a 'plain english' intuitive, yet formal, explanation of what makes QuickSort n log n? From my understanding it has to make a pass over n items, and it does this log n times...Im not sure how to put it into words why it does this log n times.

506

asked May 03 '12 05:05

Jim_CS

2 Answers

Complexity

A Quicksort starts by partitioning the input into two chunks: it chooses a "pivot" value, and partitions the input into those less than the pivot value and those larger than the pivot value (and, of course, any equal to the pivot value have go into one or the other, of course, but for a basic description, it doesn't matter a lot which those end up in).

Since the input (by definition) isn't sorted, to partition it like that, it has to look at every item in the input, so that's an O(N) operation. After it's partitioned the input the first time, it recursively sorts each of those "chunks". Each of those recursive calls looks at every one of its inputs, so between the two calls it ends up visiting every input value (again). So, at the first "level" of partitioning, we have one call that looks at every input item. At the second level, we have two partitioning steps, but between the two, they (again) look at every input item. Each successive level has more individual partitioning steps, but in total the calls at each level look at all the input items.

It continues partitioning the input into smaller and smaller pieces until it reaches some lower limit on the size of a partition. The smallest that could possibly be would be a single item in each partition.

Ideal Case

In the ideal case we hope each partitioning step breaks the input in half. The "halves" probably won't be precisely equal, but if we choose the pivot well, they should be pretty close. To keep the math simple, let's assume perfect partitioning, so we get exact halves every time.

In this case, the number of times we can break it in half will be the base-2 logarithm of the number of inputs. For example, given 128 inputs, we get partition sizes of 64, 32, 16, 8, 4, 2, and 1. That's 7 levels of partitioning (and yes log₂(128) = 7).

So, we have log(N) partitioning "levels", and each level has to visit all N inputs. So, log(N) levels times N operations per level gives us O(N log N) overall complexity.

Worst Case

Now let's revisit that assumption that each partitioning level will "break" the input precisely in half. Depending on how good a choice of partitioning element we make, we might not get precisely equal halves. So what's the worst that could happen? The worst case is a pivot that's actually the smallest or largest element in the input. In this case, we do an O(N) partitioning level, but instead of getting two halves of equal size, we've ended up with one partition of one element, and one partition of N-1 elements. If that happens for every level of partitioning, we obviously end up doing O(N) partitioning levels before even partition is down to one element.

This gives the technically correct big-O complexity for Quicksort (big-O officially refers to the upper bound on complexity). Since we have O(N) levels of partitioning, and each level requires O(N) steps, we end up with O(N * N) (i.e., O(N²)) complexity.

Practical implementations

As a practical matter, a real implementation will typically stop partitioning before it actually reaches partitions of a single element. In a typical case, when a partition contains, say, 10 elements or fewer, you'll stop partitioning and and use something like an insertion sort (since it's typically faster for a small number of elements).

Modified Algorithms

More recently other modifications to Quicksort have been invented (e.g., Introsort, PDQ Sort) which prevent that O(N²) worst case. Introsort does so by keeping track of the current partitioning "level", and when/if it goes too deep, it'll switch to a heap sort, which is slower than Quicksort for typical inputs, but guarantees O(N log N) complexity for any inputs.

PDQ sort adds another twist to that: since Heap sort is slower, it tries to avoid switching to heap sort if possible To to that, if it looks like it's getting poor pivot values, it'll randomly shuffle some of the inputs before choosing a pivot. Then, if (and only if) that fails to produce sufficiently better pivot values, it'll switch to using a Heap sort instead.

answered Oct 14 '22 01:10

Jerry Coffin

Each partitioning operation takes O(n) operations (one pass on the array). In average, each partitioning divides the array to two parts (which sums up to log n operations). In total we have O(n * log n) operations.

I.e. in average log n partitioning operations and each partitioning takes O(n) operations.

answered Oct 14 '22 01:10

Eugene Retunsky

Related questions
                            
                                Print two-dimensional array in spiral order
                            
                                How to find two most distant points?
                            
                                Parabolic knapsack
                            
                                Hash : How does it work internally?
                            
                                Fuzzy search algorithm (approximate string matching algorithm)
                            
                                why is merge sort preferred over quick sort for sorting linked lists
                            
                                What is currently the most secure one-way encryption algorithm?
                            
                                Which are the order matching algorithms most commonly used by electronic financial exchanges?
                            
                                What is the difference between genetic and evolutionary algorithms?
                            
                                What Sorting Algorithm Is Used By LINQ "OrderBy"?
                            
                                Problems with using a rough greyscale algorithm?
                            
                                Algorithm for simplifying decimal to fractions
                            
                                Efficiently getting all divisors of a given number
                            
                                Efficient way to remove half of the duplicate items in a list
                            
                                what is the difference between set and unordered_set in C++?
                            
                                How to round floats to integers while preserving their sum?
                            
                                How to count integers between large A and B with a certain property?
                            
                                Is there a nice way to assign std::minmax(a, b) to std::tie(a, b)?
                            
                                What's the difference between recursion, memoization & dynamic programming? [duplicate]
                            
                                How do ASCII art image conversion algorithms work? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With