I want to understand "median of medians" algorithm on the following example:
We have 45 distinct numbers divided into 9 group with 5 elements each.
48 43 38 33 28 23 18 13 8 49 44 39 34 29 24 19 14 9 50 45 40 35 30 25 20 15 10 51 46 41 36 31 26 21 16 53 52 47 42 37 32 27 22 17 54
Second step recursively, find the "true" median of the medians (50 45 40 35 30 25 20 15 10
) i.e. the set will be divided into 2 groups:
50 25 45 20 40 15 35 10 30
sorting these 2 groups
30 10 35 15 40 20 45 25 50
the medians is 40 and 15 (in case the numbers are even we took left median) so the returned value is 15 however "true" median of medians (50 45 40 35 30 25 20 15 10
) is 30, moreover there are 5 elements less then 15 which are much less than 30% of 45 which are mentioned in wikipedia
and so T(n) <= T(n/5) + T(7n/10) + O(n)
fails.
By the way in the Wikipedia example, I get result of recursion as 36. However, the true median is 47.
So, I think in some cases this recursion may not return true median of medians. I want to understand where is my mistake.
In computer science, the median of medians is an approximate (median) selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, mainly the quickselect, that selects the kth smallest element of an initially unsorted array.
No, unfortunately there is not a way to calculate the median based on medians of subsets of the whole and still be statistically accurate. If you wanted to calculate the mean, however, you could use the means of subsets, given that they are of equal size.
For calculating the median If an array is sorted, median is the middle element of an array in case of odd number of elements in an array and when number of elements in an array is even than it will be an average of two middle elements.
The most obvious way of finding the median of a set of numbers is to sort the list into order and then look at the one half way down the list. In other words, find the value that divides the list into two equal portions one bigger or equal and one smaller or equal than it.
The problem is in the step where you say to find the true median of the medians. In your example, you had these medians:
50 45 40 35 30 25 20 15 10
The true median of this data set is 30, not 15. You don't find this median by splitting the groups into blocks of five and taking the median of those medians, but instead by recursively calling the selection algorithm on this smaller group. The error in your logic is assuming that median of this group is found by splitting the above sequence into two blocks
50 45 40 35 30
and
25 20 15 10
then finding the median of each block. Instead, the median-of-medians algorithm will recursively call itself on the complete data set 50 45 40 35 30 25 20 15 10
. Internally, this will split the group into blocks of five and sort them, etc., but it does so to determine the partition point for the partitioning step, and it's in this partitioning step that the recursive call will find the true median of the medians, which in this case will be 30. If you use 30 as the median as the partitioning step in the original algorithm, you do indeed get a very good split as required.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With