Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding "median of medians" algorithm

I want to understand "median of medians" algorithm on the following example:

We have 45 distinct numbers divided into 9 group with 5 elements each.

48 43 38 33 28 23 18 13 8  49 44 39 34 29 24 19 14 9   50 45 40 35 30 25 20 15 10  51 46 41 36 31 26 21 16 53  52 47 42 37 32 27 22 17 54 
  1. The first step is sorting every group (in this case they are already sorted)
  2. Second step recursively, find the "true" median of the medians (50 45 40 35 30 25 20 15 10) i.e. the set will be divided into 2 groups:

    50 25  45 20   40 15  35 10  30 

    sorting these 2 groups

    30 10  35 15   40 20  45 25  50 

the medians is 40 and 15 (in case the numbers are even we took left median) so the returned value is 15 however "true" median of medians (50 45 40 35 30 25 20 15 10) is 30, moreover there are 5 elements less then 15 which are much less than 30% of 45 which are mentioned in wikipedia

and so T(n) <= T(n/5) + T(7n/10) + O(n) fails.

By the way in the Wikipedia example, I get result of recursion as 36. However, the true median is 47.

So, I think in some cases this recursion may not return true median of medians. I want to understand where is my mistake.

like image 550
simon Avatar asked Feb 28 '12 20:02

simon


People also ask

What is median of medians used for?

In computer science, the median of medians is an approximate (median) selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, mainly the quickselect, that selects the kth smallest element of an initially unsorted array.

Can you take median of medians?

No, unfortunately there is not a way to calculate the median based on medians of subsets of the whole and still be statistically accurate. If you wanted to calculate the mean, however, you could use the means of subsets, given that they are of equal size.

How do I find the median of an element?

For calculating the median If an array is sorted, median is the middle element of an array in case of odd number of elements in an array and when number of elements in an array is even than it will be an average of two middle elements.

How do you calculate median efficiency?

The most obvious way of finding the median of a set of numbers is to sort the list into order and then look at the one half way down the list. In other words, find the value that divides the list into two equal portions one bigger or equal and one smaller or equal than it.


1 Answers

The problem is in the step where you say to find the true median of the medians. In your example, you had these medians:

50 45 40 35 30 25 20 15 10 

The true median of this data set is 30, not 15. You don't find this median by splitting the groups into blocks of five and taking the median of those medians, but instead by recursively calling the selection algorithm on this smaller group. The error in your logic is assuming that median of this group is found by splitting the above sequence into two blocks

50 45 40 35 30 

and

25 20 15 10 

then finding the median of each block. Instead, the median-of-medians algorithm will recursively call itself on the complete data set 50 45 40 35 30 25 20 15 10. Internally, this will split the group into blocks of five and sort them, etc., but it does so to determine the partition point for the partitioning step, and it's in this partitioning step that the recursive call will find the true median of the medians, which in this case will be 30. If you use 30 as the median as the partitioning step in the original algorithm, you do indeed get a very good split as required.

Hope this helps!

like image 92
templatetypedef Avatar answered Oct 13 '22 01:10

templatetypedef