Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Find running median with Max-Heap and Min-Heap

I'm trying to return the running median for a series of streaming numbers. To do that I use a max-heap (which stores the values on the lower half of the series) and a min-heap (which stores the values on the higher half of the series).

In particular I'm using the Python (2.0) built-in min-heap data structure from the heapq module (https://docs.python.org/2/library/heapq.html). To build the max-heap instead I simply use the negative of the numbers I need to push into my heap.

My Python code is the following:

import heapq

maxh = []
minh = []
vals=[1,2,3,4,5,6,7,8,9,10]
for val in vals:

    # Initialize the data-structure and insert/push the 1st streaming value
    if not maxh and not minh:
        heapq.heappush(maxh,-val)
        print float(val)
    elif maxh:

        # Insert/push the other streaming values
        if val>-maxh[0]:
            heapq.heappush(minh,val)
        elif val<-maxh[0]:
            heapq.heappush(maxh,-val)

        # Calculate the median
        if len(maxh)==len(minh):
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+1:
            print float(-maxh[0])
        elif len(minh)==len(maxh)+1:
            print float(minh[0])

        # If min-heap and max-heap grow unbalanced we rebalance them by
        # removing/popping one element from a heap and inserting/pushing
        # it into the other heap, then we calculate the median
        elif len(minh)==len(maxh)+2:
            heapq.heappush(maxh,-heapq.heappop(minh))
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+2:
            heapq.heappush(minh,-heapq.heappop(maxh))
            print float(-maxh[0]+minh[0])/2

Below is the full list of test cases I've built to check my code:

vals=[1,2,3,4,5,6,7,8,9,10] # positive numbers, increasing series
vals=[10,9,8,7,6,5,4,3,2,1] # positive numbers, decreasing series
vals=[10,9,11,8,12,7,13,6,14,5] # positive numbers, jumping series (keeping
                                # heaps balanced)

vals=[-10,-9,-8,-7,-6,-5,-4,-3,-2,-1] # negative numbers, increasing series
vals=[-1,-2,-3,-4,-5,-6,-7,-8,-9,-10] # negative numbers, decreasing series
vals=[-10,-9,-11,-8,-12,-7,-13,-6,-14,-5] # negative numbers
                                          # jumping series (keeping heaps
                                          # balanced)

vals=[-5,-4,-3,-2,-1,0,1,2,3,4,5] # mixed positive-negative numbers,
                                  # increasing series
vals=[5,4,3,2,1,0,-1,-2,-3,-4,-5] # mixed positive-negative numbers,
                                  # decreasing series
vals=[0,-1,1,-2,2,-3,3,-4,4,-5,5] # mixed positive-negative numbers,
                                  # jumping series (keeping heaps balanced)

My code seems ok to me but I cannot pass 4 out of 10 test cases with an online judge (https://www.hackerrank.com/challenges/ctci-find-the-running-median/problem).

Do you have any hint?

like image 564
SergeGardien Avatar asked Aug 03 '17 10:08

SergeGardien


People also ask

How would you find a median of large integer input stream?

Making it clear, when the input size is odd, we take the middle element of sorted data. If the input size is even, we pick the average of the middle two elements in the sorted stream. Note that output is the effective median of integers read from the stream so far. Such an algorithm is called an online algorithm.

How do you find the median of a priority queue?

Maintain two priority queues of the numbers greater and less than the median value. Shift values between the two queues such that they stay balanced, or close to balanced, and define the median based on the top values of the priority queues.


1 Answers

The problem is here:

    # Insert/push the other streaming values
    if val>-maxh[0]:
        heapq.heappush(minh,val)
    elif val<-maxh[0]:
        heapq.heappush(maxh,-val)

If val == maxh[0], then the item is never pushed onto either heap. You should be able to reveal the error with the test case [1,1,2].

A simple fix would be:

    # Insert/push the other streaming values
    if val >= -maxh[0]:
        heapq.heappush(minh,val)
    else
        heapq.heappush(maxh,-val)
like image 143
Jim Mischel Avatar answered Oct 14 '22 02:10

Jim Mischel