Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Efficiently Calculating a Running Median [duplicate]

Those of you that have read my previous questions know about my work at understanding and implementing quicksort and quickselect, as well as some other basic algorithms.

Quickselect is used to calculate the kth smallest element in an unsorted list, and this concept can also be used to find the median in an unsorted list.

This time, I need aid in devising an efficient technique to calculate the running median, because quickselect isn't a good choice as it needs to re-calculate every time the list changes. Because quickselect has to restart everytime, it can't take advantage of previous calculations done, so I'm looking for a different algorithm that's similar (possibly) but is more efficient in the area of running medians.

like image 573
Edge Avatar asked Jun 07 '12 11:06

Edge


People also ask

How do you calculate median efficiency?

The most obvious way of finding the median of a set of numbers is to sort the list into order and then look at the one half way down the list. In other words, find the value that divides the list into two equal portions one bigger or equal and one smaller or equal than it.

Which one is the best suited data structure to calculate running median of a stream of numbers?

Approach using Heap data structure The idea is to use max heap and min-heap data structure to store the elements of the lower half and higher half respectively. We use a max heap to represent elements that are less than effective median, and a min heap to represent elements that are greater than effective median.

How do you find a median when moving?

M = movmedian( A , k ) returns an array of local k -point median values, where each median is calculated over a sliding window of length k across neighboring elements of A . When k is odd, the window is centered about the element in the current position.


2 Answers

The streaming median is computed using two heaps. All the numbers less than or equal to the current median are in the left heap, which is arranged so that the maximum number is at the root of the heap. All the numbers greater than or equal to the current median are in the right heap, which is arranged so that the minimum number is at the root of the heap. Note that numbers equal to the current median can be in either heap. The count of numbers in the two heaps never differs by more than 1.

When the process begins the two heaps are initially empty. The first number in the input sequence is added to one of the heaps, it doesn’t matter which, and returned as the first streaming median. The second number in the input sequence is then added to the other heap, if the root of the right heap is less than the root of the left heap the two heaps are swapped, and the average of the two numbers is returned as the second streaming median.

Then the main algorithm begins. Each subsequent number in the input sequence is compared to the current median, and added to the left heap if it is less than the current median or to the right heap if it is greater than the current median; if the input number is equal to the current median, it is added to whichever heap has the smaller count, or to either heap arbitrarily if they have the same count. If that causes the counts of the two heaps to differ by more than 1, the root of the larger heap is removed and inserted in the smaller heap. Then the current median is computed as the root of the larger heap, if they differ in count, or the average of the roots of the two heaps, if they are the same size.

Code in Scheme and Python is available at my blog.

like image 69
user448810 Avatar answered Sep 28 '22 06:09

user448810


The Jeff McClintock running median estimate. Requires keeping only two values. This example iterates over an array of sampled values (CPU consumption). Seems to converge relatively quickly (about 100 samples) to an estimate of the median. The idea is at each iteration the median inches toward the input signal at a constant rate. The rate depends on what magnitude you estimate the median to be. I use the average as an estimate of the magnitude of the median, to determines the size of each increment of the median. If you need your median accurate to about 1%, use a step-size of 0.01 * the average.

float median = 0.0f; float average = 0.0f;  // for each sample {     average += ( sample - average ) * 0.1f; // rough running average.     median += _copysign( average * 0.01, sample - median ); } 

enter image description here

like image 23
Jeff McClintock Avatar answered Sep 28 '22 05:09

Jeff McClintock