Interview Question:
Edited Below You are given an array. You make 2 heaps out of it, one minheap and the other max heap. Now find the median of the array using these 2 provided heaps in O(nlog n) time.
Corrected Question Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?
Solution This problem can be solved using 2 heaps and the median can always be accessed in O(1) time.
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values. For example, for arr = [2,3,4] , the median is 3 . For example, for arr = [2,3] , the median is (2 + 3) / 2 = 2.5 .
The median will be the mean of the number before the n + 1 2 position and the number after the n + 1 2 position: n + 1 2 = 50 + 1 2 = 51 2 = 25.5 . The number before the 25.5 position is 19, and the number after the 25.5 position is 21. This means that the median is 19 + 21 2 = 40 2 = 20 .
The median of a sorted array of size N is defined as the middle element when N is odd and average of middle two elements when N is even.
Revised on May 23, 2022. The median is the value that's exactly in the middle of a dataset when it is ordered. It's a measure of central tendency that separates the lowest 50% from the highest 50% of values. The steps for finding the median differ depending on whether you have an odd or an even number of data points.
Here's how you use both heaps. Note that I'm assuming you don't know the number of elements, and this is why we must pop until we pop something from the min heap that is larger than or equal to what we pop from the max heap. Note that we return the average because in the case of a set like {1, 2, 3, 4}
the median is actually 2.5
(the average of the two "middle" values). I'm assuming double
as the value type, but this can obviously be anything. Here:
double min = minheap.pop();
double max = maxheap.pop();
while(min < max) {
min = minheap.pop();
max = maxheap.pop();
}
return (min + max) / 2;
Since popping is O(log n)
and we have to pop O(n / 2)
values, this is O(n log n)
.
A working implementation in java, using 2 heaps, O(n log n). At any time I keep the two heaps balanced in size (ie. they differ at most by 1, if we entered n elements such that n%2==1). Getting the median is O(1). Adding a new element is O(log n).
public class MedianOfStream {
private int count;
private PriorityQueue<Integer> highs, lows;
public MedianOfStream() {
highs = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
@Override
public int compare(Integer arg0, Integer arg1) {
return arg0.compareTo(arg1);
}
});
lows = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
@Override
public int compare(Integer arg0, Integer arg1) {
return arg1.compareTo(arg0);
}
});
}
private int getMedian() {
if (count == 0)
return 0;
if (lows.size() == highs.size()) {
return (lows.peek() + highs.peek()) / 2;
} else if (lows.size() < highs.size()) {
return highs.peek();
}
return lows.peek();
}
private void swap(){
int h = highs.poll();
int l = lows.poll();
highs.add(l);
lows.add(h);
}
public int updateMedian(int n) {
count++;
if (count == 1)
lows.add(n);
else if (count==2) {
highs.add(n);
if(highs.peek()<lows.peek()) {
swap(); // O(log n)
}
}
else {
if (n > highs.peek()) {
lows.add(highs.poll()); // O(log n)
highs.add(n); // O(log n)
} else {
highs.add(lows.poll()); // O(log n)
lows.add(n); // O(log n)
}
if(highs.peek()<lows.peek()) {
swap(); // O(log n)
}
}
// if we added an even # of items,
// the heaps must be exactly the same size,
// otherwise we tolerate a 1-off difference
if (Math.abs(lows.size() - highs.size()) > (count % 2)) {
if (lows.size() < highs.size()) {
lows.add(highs.poll()); // O(log n)
} else {
highs.add(lows.poll()); // O(log n)
}
}
return getMedian(); // O(1)
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With