Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding mean and median in constant time

This is a common interview question. You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).

Implement a class which supports three methods in O(1) 

* insert(int i); 
* getMean(); 
* getMedian(); 

This is my code.

public class FindAverage {

  private int[] store;
  private long size;
  private long total;
  private int highestIndex;
  private int lowestIndex;

  public FindAverage() {
    store  = new int[1000];
    size = 0;
    total = 0;
    highestIndex = Integer.MIN_VALUE;
    lowestIndex = Integer.MAX_VALUE;

  }

  public void insert(int item) throws OutOfRangeException {
    if(item < 0 || item > 999){
      throw new OutOfRangeException();
    }
    store[item] ++;
    size ++;
    total += item;
    highestIndex = Integer.max(highestIndex, item);
    lowestIndex = Integer.min(lowestIndex, item);
  }

  public float getMean(){
    return (float)total/size;
  }

  public float getMedian(){

  }
}

I can't seem to think of a way to get the median in O(1) time. Any help appreciated.

like image 319
Melissa Stewart Avatar asked Mar 28 '17 14:03

Melissa Stewart


Video Answer


3 Answers

For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.

Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).

sort(A):
  ds = new D()
  for each x in A:
    ds.add(x)
  m1 = min(A) - 1
  m2 = max(A) + 1
  for (i=0; i < A.size(); i++):
    ds.add(m1)
  # at this point, ds.median() is smallest element in A
  for (i = 0; i < A.size(); i++):
    yield ds.median()
    # Each two insertions advances median by 1
    ds.add(m2)
    ds.add(m2)

Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.

Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.

Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.

QED.

like image 126
amit Avatar answered Oct 21 '22 02:10

amit


The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.

To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.

EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.

like image 32
Ivaylo Strandjev Avatar answered Oct 21 '22 02:10

Ivaylo Strandjev


You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.

You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.

Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.

Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:

An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.

In your case, that bound is 1000.

like image 36
Andreas Avatar answered Oct 21 '22 00:10

Andreas