Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Median of medians algorithm: why divide the array into blocks of size 5

In median-of-medians algorithm, we need to divide the array into chunks of size 5. I am wondering how did the inventors of the algorithms came up with the magic number '5' and not, may be, 7, or 9 or something else?

like image 659
Darth.Vader Avatar asked Aug 07 '13 06:08

Darth.Vader


People also ask

Is median of medians divide and conquer?

It is a divide and conquer algorithm in that, it returns a pivot that in the worst case will divide a list of unsorted elements into sub-problems of size 3n10 3 n 10 and 7n10 7 n 10 assuming we choose a sublist size of 5.

Why does median of medians work?

The median-of-medians algorithm is a deterministic linear-time selection algorithm. The algorithm works by dividing a list into sublists and then determines the approximate median in each of the sublists. Then, it takes those medians and puts them into a list and finds the median of that list.

Why is median of medians linear?

The median is a good pivot – the best for sorting, and the best overall choice for selection – decreasing the search set by half at each step. Thus if one can compute the median in linear time, this only adds linear time to each step, and thus the overall complexity of the algorithm remains linear.

Does the algorithm run in linear time if the elements are divided into groups of 3 elements?

So with groups of 3 elements SELECT does not run in linear time. The reason for this is that during step 5, we are still left with a subproblems of total size n.


3 Answers

The number has to be larger than 3 (and an odd number, obviously) for the algorithm. 5 is the smallest odd number larger than 3. So 5 was chosen.

like image 191
rabensky Avatar answered Sep 18 '22 16:09

rabensky


I think that if you'll check "Proof of O(n) running time" section of wiki page for medians-of-medians algorithm:

The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians is 20% of the size of the list, while the other recursive call recurses on at most 70% of the list, making the running time

Image

The O(n) term c n is for the partitioning work (we visited each element a constant number of times, in order to form them into n/5 groups and take each median in O(1) time). From this, using induction, one can easily show that

Image

That should help you to understand, why.

like image 34
Alma Do Avatar answered Sep 18 '22 16:09

Alma Do


You can also use blocks of size 3 or 4, as shown in the paper Select with groups of 3 or 4 by K. Chen and A. Dumitrescu (2015). The idea is to use the "median of medians" algorithm twice and partition only after that. This lowers the quality of the pivot but is faster.

So instead of:

T(n) <= T(n/3) + T(2n/3) + O(n)
T(n) = O(nlogn)

one gets:

T(n) <= T(n/9) + T(7n/9) + O(n)
T(n) = Theta(n)
like image 38
Ecir Hana Avatar answered Sep 21 '22 16:09

Ecir Hana