Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average Runtime of Quickselect

Wikipedia states that the average runtime of quickselect algorithm (Link) is O(n). However, I could not clearly understand how this is so. Could anyone explain to me (via recurrence relation + master method usage) as to how the average runtime is O(n)?

like image 687
Ravi Avatar asked May 10 '11 04:05

Ravi


People also ask

What is the expected runtime of quickselect?

1 The expected running time of QuickSelect is O(n). greater be the left and right arrays obtained if pivot is rank i element of A. greater if j>i and terminates if j = i. As in QuickSort we obtain the following recurrence where T(n) is the worst-case expected time.

What is the best and worst case time complexity for quickselect?

Bookmark this question. Show activity on this post. "However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side – the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n), with a worst case of O(n^2)."

What is the average-case time complexity of quick sort?

The average time complexity of quick sort is O(N log(N)). The derivation is based on the following notation: T(N) = Time Complexity of Quick Sort for input of size N.

Why is quick select O of n?

Like quicksort, quickselect has a worst case runtime of O(n^2) (if bad pivots are chosen), not O(n). It can run in expectation time n because it's only sorting one side, as you point out. However, there are algorithms (namely, the median-of-medians algorithm) that do run in worst-case O(n) time.


2 Answers

Because

we already know which partition our desired element lies in.

We do not need to sort (by doing partition on) all the elements, but only do operation on the partition we need.


As in quick sort, we have to do partition in halves *, and then in halves of a half, but this time, we only need to do the next round partition in one single partition (half) of the two where the element is expected to lie in.

It is like (not very accurate)

n + 1/2 n + 1/4 n + 1/8 n + ..... < 2 n

So it is O(n).

Half is used for convenience, the actual partition is not exact 50%.

like image 136
Dante May Code Avatar answered Sep 25 '22 04:09

Dante May Code


To do an average case analysis of quickselect one has to consider how likely it is that two elements are compared during the algorithm for every pair of elements and assuming a random pivoting. From this we can derive the average number of comparisons. Unfortunately the analysis I will show requires some longer calculations but it is a clean average case analysis as opposed to the current answers.

Let's assume the field we want to select the k-th smallest element from is a random permutation of [1,...,n]. The pivot elements we choose during the course of the algorithm can also be seen as a given random permutation. During the algorithm we then always pick the next feasible pivot from this permutation therefore they are chosen uniform at random as every element has the same probability of occurring as the next feasible element in the random permutation.

There is one simple, yet very important, observation: We only compare two elements i and j (with i<j) if and only if one of them is chosen as first pivot element from the range [min(k,i), max(k,j)]. If another element from this range is chosen first then they will never be compared because we continue searching in a sub-field where at least one of the elements i,j is not contained in.

Because of the above observation and the fact that the pivots are chosen uniform at random the probability of a comparison between i and j is:

2/(max(k,j) - min(k,i) + 1)

(Two events out of max(k,j) - min(k,i) + 1 possibilities.)

We split the analysis in three parts:

  1. max(k,j) = k, therefore i < j <= k
  2. min(k,i) = k, therefore k <= i < j
  3. min(k,i) = i and max(k,j) = j, therefore i < k < j

In the third case the less-equal signs are omitted because we already consider those cases in the first two cases.

Now let's get our hands a little dirty on calculations. We just sum up all the probabilities as this gives the expected number of comparisons.

Case 1

Case 2

Similar to case 1 so this remains as an exercise. ;)

Case 3

We use H_r for the r-th harmonic number which grows approximately like ln(r).

Conclusion

All three cases need a linear number of expected comparisons. This shows that quickselect indeed has an expected runtime in O(n). Note that - as already mentioned - the worst case is in O(n^2).

Note: The idea of this proof is not mine. I think that's roughly the standard average case analysis of quickselect.

If there are any errors please let me know.

like image 40
maybeshewill Avatar answered Sep 22 '22 04:09

maybeshewill