The question is how we can find the median of a receiving stream of integer values (e.g. for 12, 14, 252, 243, 15 the median is 15) in O(log N) where N is number of values. Please note that we have a stream of integer values, hence by receiving each value, we have to re-find the median.
Example:
| Input | median
1 | 12 | 12
2 | 14 | 13 = (12+14)/2
3 | 252 | 14
.
.
.
P.S: An example of using this algorithm could be filtering an image.
To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median.
Finding the median in O(n log n) The most straightforward way to find the median is to sort the list and just pick the median by its index. The fastest comparison-based sort is O(nlogn) , so that dominates the runtime. Although this method offers the simplest code, it's certainly not the fastest.
The median is calculated by arranging the scores in numerical order, dividing the total number of scores by two, then rounding that number up if using an odd number of scores to get the position of the median or, if using an even number of scores, by averaging the number in that position and the next position.
With our heap size invariant, we can compute the median as the average of the root elements of both heaps, if the sizes of both heaps are (n / 2). Otherwise, the root element of the min-heap is the median.
Okay, with the update to the question so the intent is clear (not just find the median, but re-find the median each time you receive a new number), I think there's a way.
I'd start with a pair of heaps: a max-heap and a min-heap. The min-heap will contain the numbers larger than the median, and the max-heap the numbers smaller than the median. When you receive the first number, that's your median. When you receive the second, you insert the smaller of the two into the max-heap, and the larger of the two into the min-heap. The median is then the average of the smallest on the min-heap, and the largest on the max-heap.
Along with the two heaps, you'll want storage for a single integer that will be the current median when you've received an odd number of inputs. You'll populate that fairly simply: if you receive an input with it currently full, you basically sort those two items (the new number and the old median) and insert the smaller into the heap for the smaller items, and larger into the heap for larger items. Your new median will then be the mean of the bases of those two heaps (and you'll mark the other storage location as empty).
When you receive a new number with that empty, you'll compare the new number to the median. If it's between the numbers as the bases of the heaps, it's the new median, and you're done. Otherwise, extract the number from the base that must hold the median (larger numbers if the new number is larger, smaller if it's smaller) and put that into the median spot, then insert the new number into the heap that came from.
At least if memory serves, the extract/insert into a heap should be O(log N). I believe everything else involved should be constant complexity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With