Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find median in a fixed-size moving window along a long sequence of data

Given a sequence of data (it may have duplicates), a fixed-sized moving window, move the window at each iteration from the start of the data sequence, such that (1) the oldest data element is removed from the window and a new data element is pushed into the window (2) find the median of the data inside the window at each moving.

The following posts are not helpful.

Effectively to find the median value of a random sequence

joining data based on a moving time window in R

My idea:

Use 2 heaps to hold median. In side the window, sort the data in the window in the first iteration, the min heap holds the larger part and the max heap holds the smaller part. If the window has odd number of data, the max heap returns the median otherwise the arithmetic mean of the top elements of the two heaps is the median.

When a new data is pushed in to the window, remove the oldest data from one of the heap and compare the new data with the top of max and min heap so that to decide which heap the data to be put. Then, find the median just like in the first iteration.

But, how to find a data element in a heap is a problem. Heap is a binary tree not a binary search tree.

Is it possible to solve it with O(n) or O(n * lg m) where m is the window size and space: O(1) ?

Any help is really appreciated.

Thanks

like image 684
user1002288 Avatar asked Mar 23 '12 14:03

user1002288


People also ask

What is a sliding window median?

Sliding Window Median. The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle values. For examples, if arr = [2,3,4] , the median is 3 .

How would you find a median of large integer input stream?

odd number of integers, the middle element is the median – in the ordered set { 5, 7, 10 }, the median is 7. even number of integers, there's no middle element; the median is computed as the average of the two middle elements – in the ordered set {5, 7, 8, 10}, the median is (7 + 8) / 2 = 7.5.


1 Answers

O(n*lg m) is easy:

Just maintain your window as two std::sets, one for the lower half, one for the upper half. Insertion of a new element costs O(lg m), finding and removal of an old element costs the same. Determining the median using the method you described in your question costs O(1).

As you slide the window over your sequence, in each iteration you remove the item falling out of the window (O(lg m)), insert the new item (O(lg m)) and compute the median (O(1)), resulting in a total of O(n lg m).

This solution uses space O(m), of course but I don't think you can get away without storing the window's contents.

like image 164
hc_ Avatar answered Sep 24 '22 02:09

hc_