Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intuitive explanation for why QuickSort is n log n?

Is anybody able to give a 'plain english' intuitive, yet formal, explanation of what makes QuickSort n log n? From my understanding it has to make a pass over n items, and it does this log n times...Im not sure how to put it into words why it does this log n times.

like image 506
Jim_CS Avatar asked May 03 '12 05:05

Jim_CS


People also ask

How do you explain quicksort?

The Idea of QuickSort Quicksort is a fast sorting algorithm that works by splitting a large array of data into smaller sub-arrays. This implies that each iteration works by splitting the input into two components, sorting them, and then recombining them.

Does sorting take log n space?

Any sorting algorithm that operates on a contiguous array requires O⁢(log ⁡n) extra space, since this is the number of bite [sic] required to represent an index into the array.

What is the logic of quicksort?

The key process in quickSort is a partition(). The target of partitions is, given an array and an element x of an array as the pivot, put x at its correct position in a sorted array and put all smaller elements (smaller than x) before x, and put all greater elements (greater than x) after x.

Is N log n better than n2?

So, O(N*log(N)) is far better than O(N^2) . It is much closer to O(N) than to O(N^2) . But your O(N^2) algorithm is faster for N < 100 in real life. There are a lot of reasons why it can be faster.


2 Answers

Complexity

A Quicksort starts by partitioning the input into two chunks: it chooses a "pivot" value, and partitions the input into those less than the pivot value and those larger than the pivot value (and, of course, any equal to the pivot value have go into one or the other, of course, but for a basic description, it doesn't matter a lot which those end up in).

Since the input (by definition) isn't sorted, to partition it like that, it has to look at every item in the input, so that's an O(N) operation. After it's partitioned the input the first time, it recursively sorts each of those "chunks". Each of those recursive calls looks at every one of its inputs, so between the two calls it ends up visiting every input value (again). So, at the first "level" of partitioning, we have one call that looks at every input item. At the second level, we have two partitioning steps, but between the two, they (again) look at every input item. Each successive level has more individual partitioning steps, but in total the calls at each level look at all the input items.

It continues partitioning the input into smaller and smaller pieces until it reaches some lower limit on the size of a partition. The smallest that could possibly be would be a single item in each partition.

Ideal Case

In the ideal case we hope each partitioning step breaks the input in half. The "halves" probably won't be precisely equal, but if we choose the pivot well, they should be pretty close. To keep the math simple, let's assume perfect partitioning, so we get exact halves every time.

In this case, the number of times we can break it in half will be the base-2 logarithm of the number of inputs. For example, given 128 inputs, we get partition sizes of 64, 32, 16, 8, 4, 2, and 1. That's 7 levels of partitioning (and yes log2(128) = 7).

So, we have log(N) partitioning "levels", and each level has to visit all N inputs. So, log(N) levels times N operations per level gives us O(N log N) overall complexity.

Worst Case

Now let's revisit that assumption that each partitioning level will "break" the input precisely in half. Depending on how good a choice of partitioning element we make, we might not get precisely equal halves. So what's the worst that could happen? The worst case is a pivot that's actually the smallest or largest element in the input. In this case, we do an O(N) partitioning level, but instead of getting two halves of equal size, we've ended up with one partition of one element, and one partition of N-1 elements. If that happens for every level of partitioning, we obviously end up doing O(N) partitioning levels before even partition is down to one element.

This gives the technically correct big-O complexity for Quicksort (big-O officially refers to the upper bound on complexity). Since we have O(N) levels of partitioning, and each level requires O(N) steps, we end up with O(N * N) (i.e., O(N2)) complexity.

Practical implementations

As a practical matter, a real implementation will typically stop partitioning before it actually reaches partitions of a single element. In a typical case, when a partition contains, say, 10 elements or fewer, you'll stop partitioning and and use something like an insertion sort (since it's typically faster for a small number of elements).

Modified Algorithms

More recently other modifications to Quicksort have been invented (e.g., Introsort, PDQ Sort) which prevent that O(N2) worst case. Introsort does so by keeping track of the current partitioning "level", and when/if it goes too deep, it'll switch to a heap sort, which is slower than Quicksort for typical inputs, but guarantees O(N log N) complexity for any inputs.

PDQ sort adds another twist to that: since Heap sort is slower, it tries to avoid switching to heap sort if possible To to that, if it looks like it's getting poor pivot values, it'll randomly shuffle some of the inputs before choosing a pivot. Then, if (and only if) that fails to produce sufficiently better pivot values, it'll switch to using a Heap sort instead.

like image 54
Jerry Coffin Avatar answered Oct 14 '22 01:10

Jerry Coffin


Each partitioning operation takes O(n) operations (one pass on the array). In average, each partitioning divides the array to two parts (which sums up to log n operations). In total we have O(n * log n) operations.

I.e. in average log n partitioning operations and each partitioning takes O(n) operations.

like image 22
Eugene Retunsky Avatar answered Oct 14 '22 01:10

Eugene Retunsky