Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Median of Medians

Tags:

median

I have read the order statistics to find the k-th smallest (or largest) element in an array of size n in linear time O(n).

There is one step that it needs to find the median of the medians.

  1. Split the array into [n/5] parts. Each part has 5 elements.
  2. Find the median in each part. (We have [n/5] numbers now)
  3. Repeat step 1 and 2 until we only have the last number. (i.e. recursive)

T(n) = T(n/5) + O(n) and we can get T(n) = O(n).

But, is it true that, the number we finally get is not the median of medians, but the median of medians of medians of medians of medians of medians, if we have a large array.

Please consider an array which has 125 elements.

First, it is split into 25 parts and we find 25 medians. Then, we split these 25 numbers into 5 parts and find 5 medians, Finally, we obtain the number which is median of medians of medians. (Not median of medians)

The reason why I care about it is that, I can understand there are at most about [3/4]*n elements that are smaller (or larger) than the median of medians. But what if it is not the median of medians but the median of medians of medians? In worse case there must be less elements that are smaller (or larger) than the pivot, which means the pivot is closer to the bound of the array.

If we have a VERY large array, and we found its median of medians of medians of medians of medians of medians. In the worst case the pivot we found can still be very close to the bound and what is the time complexity in this case?

I made up a dataset of 125 elements. Is the result 9?

0.8 0.9 1 inf inf
1.8 1.9 2 inf inf
6.8 6.9 7 inf inf
inf inf inf inf inf
inf inf inf inf inf

2.8 2.9 3 inf inf
3.8 3.9 4 inf inf
7.8 7.9 8 inf inf
inf inf inf inf inf
inf inf inf inf inf

4.8 4.9 5 inf inf
5.8 5.9 6 inf inf
8.8 8.9 9 inf inf
inf inf inf inf inf
inf inf inf inf inf

inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf

inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf
inf inf inf inf inf

where inf means the number is large enough.

like image 415
01zhou Avatar asked Jul 10 '13 23:07

01zhou


People also ask

Can you take a median-of-medians?

No, unfortunately there is not a way to calculate the median based on medians of subsets of the whole and still be statistically accurate. If you wanted to calculate the mean, however, you could use the means of subsets, given that they are of equal size.

Is median-of-medians divide and conquer?

It is a divide and conquer algorithm in that, it returns a pivot that in the worst case will divide a list of unsorted elements into sub-problems of size 3n10 3 n 10 and 7n10 7 n 10 assuming we choose a sublist size of 5.

How do you sort median?

Problem Statement The median of a set of values is computed as follows: Sort the values. If the number of values is odd, the median is the middle value. If the number of values is even, the median is the average of the two middle values.


1 Answers

Let's denote your median of medians of medians of ... as [median of]* = M.

First, I believe that median of medians algorithm (to select a good pivot) is not recursive. The algorithm goes as follows:

  1. Split the elements in the groups of 5
  2. Find the median of each group
  3. Find the median of medians and use it as a pivot.

Median of medians will be smaller than 3n/10 elements and larger than another 3n/10 elements, not 3n/4. You have n/5 numbers after selecting medians. Median of median is greater/smaller than half of those numbers, which is n/10. Each of those numbers is a median itself, so it's greater/smaller than 2 numbers, giving you another 2n/10 numbers. Now in total, you get n/10 + 2n/10 = 3n/10 numbers.

To address your second question, after collecting the group of 5's in your example dataset and calculating their medians, we will have the following sequence:

1, 2, 7, inf, inf
3, 4, 8, inf, inf
5, 6, 9, inf, inf, 
inf, inf, inf, inf, inf, 
inf, inf, inf, inf, inf.

So the median of medians would indeed be 9.

Your proposed [median of]* algorithm's runtime will be:

T(n) = O(n * log(n))

Now let's try to analyze how many numbers we have less/greater than M. We have the following groups:

  • depth 1: n/5 elements all medians
  • depth 2: n/25 elements all medians
  • ...
  • depth i: n/(5^i) elements all medians

Each group is less/greater than 2 elements of the previous depth, which is less/greater than 2 elements of the previous depth, and so on:

Calculating in total, we get that our M is greater/less than (n * (2^k) + k * n) /((2^k) * (5^k)). For depth = 1 you get median of medians, which is 3n/10.

Now assuming your depth is [log_5 (n)], i.e. n = 5^k, we get:

5^k * (k + 2^k)/(5^k * 2^k) which is -> 1.

like image 94
gramonov Avatar answered Sep 27 '22 17:09

gramonov