Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping track of the median of an expanding array

Interview Question:

Edited Below You are given an array. You make 2 heaps out of it, one minheap and the other max heap. Now find the median of the array using these 2 provided heaps in O(nlog n) time.

Corrected Question Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?

Solution This problem can be solved using 2 heaps and the median can always be accessed in O(1) time.

like image 755
EFreak Avatar asked Feb 02 '11 17:02

EFreak


People also ask

How do you keep track of the median?

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values. For example, for arr = [2,3,4] , the median is 3 . For example, for arr = [2,3] , the median is (2 + 3) / 2 = 2.5 .

How do you find the median of a very large dataset?

The median will be the mean of the number before the n + 1 2 position and the number after the n + 1 2 position: n + 1 2 = 50 + 1 2 = 51 2 = 25.5 . The number before the 25.5 position is 19, and the number after the 25.5 position is 21. This means that the median is 19 + 21 2 = 40 2 = 20 .

What is median of an array?

The median of a sorted array of size N is defined as the middle element when N is odd and average of middle two elements when N is even.

What is a median in math?

Revised on May 23, 2022. The median is the value that's exactly in the middle of a dataset when it is ordered. It's a measure of central tendency that separates the lowest 50% from the highest 50% of values. The steps for finding the median differ depending on whether you have an odd or an even number of data points.


2 Answers

Here's how you use both heaps. Note that I'm assuming you don't know the number of elements, and this is why we must pop until we pop something from the min heap that is larger than or equal to what we pop from the max heap. Note that we return the average because in the case of a set like {1, 2, 3, 4} the median is actually 2.5 (the average of the two "middle" values). I'm assuming double as the value type, but this can obviously be anything. Here:

double min = minheap.pop();
double max = maxheap.pop();
while(min < max) {
    min = minheap.pop();
    max = maxheap.pop();
}

return (min + max) / 2;

Since popping is O(log n) and we have to pop O(n / 2) values, this is O(n log n).

like image 106
jason Avatar answered Oct 14 '22 05:10

jason


A working implementation in java, using 2 heaps, O(n log n). At any time I keep the two heaps balanced in size (ie. they differ at most by 1, if we entered n elements such that n%2==1). Getting the median is O(1). Adding a new element is O(log n).

public class MedianOfStream {

    private int count;
    private PriorityQueue<Integer> highs, lows;

    public MedianOfStream() {
        highs = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
            @Override
            public int compare(Integer arg0, Integer arg1) {
                return arg0.compareTo(arg1);
            }
        });
        lows = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
            @Override
            public int compare(Integer arg0, Integer arg1) {
                return arg1.compareTo(arg0);
            }
        });
    }

    private int getMedian() {
        if (count == 0)
            return 0;
        if (lows.size() == highs.size()) {
            return (lows.peek() + highs.peek()) / 2;
        } else if (lows.size() < highs.size()) {
            return highs.peek();
        }
        return lows.peek();
    }

    private void swap(){
        int h = highs.poll();
        int l = lows.poll();
        highs.add(l);
        lows.add(h);
    }

    public int updateMedian(int n) {
        count++;

        if (count == 1)
            lows.add(n);

        else if (count==2) {
            highs.add(n);
            if(highs.peek()<lows.peek()) {
                swap(); // O(log n)
            }
        }

        else {
            if (n > highs.peek()) {
                lows.add(highs.poll()); // O(log n)
                highs.add(n); // O(log n)
            } else {
                highs.add(lows.poll()); // O(log n)
                lows.add(n); // O(log n)
            }
            if(highs.peek()<lows.peek()) {
                swap(); // O(log n)
            }
        }

        // if we added an even # of items,
        // the heaps must be exactly the same size,
        // otherwise we tolerate a 1-off difference

        if (Math.abs(lows.size() - highs.size()) > (count % 2)) {
            if (lows.size() < highs.size()) {
                lows.add(highs.poll()); // O(log n)
            } else {
                highs.add(lows.poll()); // O(log n)
            }
        }

        return getMedian(); // O(1)
    }
}
like image 41
Savino Sguera Avatar answered Oct 14 '22 05:10

Savino Sguera