Like a Max-heap and Min-heap, I want to implement a Median-heap to keep track of the median of a given set of integers. The API should have the following three functions:
insert(int) // should take O(logN) int median() // will be the topmost element of the heap. O(1) int delmedian() // should take O(logN)
I want to use an array (a) implementation to implement the heap where the children of array index k are stored in array indices 2*k and 2*k + 1. For convenience, the array starts populating elements from index 1. This is what I have so far: The Median-heap will have two integers to keep track of number of integers inserted so far that are > current median (gcm) and < current median (lcm).
if abs(gcm-lcm) >= 2 and gcm > lcm we need to swap a[1] with one of its children. The child chosen should be greater than a[1]. If both are greater, choose the smaller of two.
Similarly for the other case. I can't come up with an algorithm for how to sink and swim elements. I think it should take into consideration how close the number is to the median, so something like:
private void swim(int k) { while (k > 1 && absless(k, k/2)) { exch(k, k/2); k = k/2; } }
I can't come up with the entire solution though.
With our heap size invariant, we can compute the median as the average of the root elements of both heaps, if the sizes of both heaps are (n / 2). Otherwise, the root element of the min-heap is the median.
To build a max heap, you:Assign it a value. Compare the value of the child node with the parent node. Swap nodes if the value of the parent is less than that of either child (to the left or right). Repeat until the largest element is at the root parent nodes (then you can say that the heap property holds).
Making it clear, when the input size is odd, we take the middle element of sorted data. If the input size is even, we pick the average of the middle two elements in the sorted stream. Note that output is the effective median of integers read from the stream so far. Such an algorithm is called an online algorithm.
You need two heaps: one min-heap and one max-heap. Each heap contains about one half of the data. Every element in the min-heap is greater or equal to the median, and every element in the max-heap is less or equal to the median.
When the min-heap contains one more element than the max-heap, the median is in the top of the min-heap. And when the max-heap contains one more element than the min-heap, the median is in the top of the max-heap.
When both heaps contain the same number of elements, the total number of elements is even. In this case you have to choose according your definition of median: a) the mean of the two middle elements; b) the greater of the two; c) the lesser; d) choose at random any of the two...
Every time you insert, compare the new element with those at the top of the heaps in order to decide where to insert it. If the new element is greater than the current median, it goes to the min-heap. If it is less than the current median, it goes to the max heap. Then you might need to rebalance. If the sizes of the heaps differ by more than one element, extract the min/max from the heap with more elements and insert it into the other heap.
In order to construct the median heap for a list of elements, we should first use a linear time algorithm and find the median. Once the median is known, we can simply add elements to the Min-heap and Max-heap based on the median value. Balancing the heaps isn't required because the median will split the input list of elements into equal halves.
If you extract an element you might need to compensate the size change by moving one element from one heap to another. This way you ensure that, at all times, both heaps have the same size or differ by just one element.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With