Incremental median computation with max memory efficiency

Question

I have a process that generates values and that I observe. When the process terminates, I want to compute the median of those values.

If I had to compute the mean, I could just store the sum and the number of generated values and thus have O(1) memory requirement. How about the median? Is there a way to save on the obvious O(n) coming from storing all the values?

Edit: Interested in 2 cases: 1) the stream length is known, 2) it's not.

deinst · Accepted Answer

You are going to need to store at least ceil(n/2) points, because any one of the first n/2 points could be the median. It is probably simplest to just store the points and find the median. If saving ceil(n/2) points is of value, then read in the first n/2 points into a sorted list (a binary tree is probably best), then as new points are added throw out the low or high points and keep track of the number of points on either end thrown out.

Edit:

If the stream length is unknown, then obviously, as Stephen observed in the comments, then we have no choice but to remember everything. If duplicate items are likely, we could possibly save a bit of memory using Dolphins idea of storing values and counts.

Stephen · Answer

You can

Use statistics, if that's acceptable - for example, you could use sampling.
Use knowledge about your number stream
- using a counting sort like approach: k distinct values means storing O(k) memory)
- or toss out known outliers and keep a (high,low) counter.
- If you know you have no duplicates, you could use a bitmap... but that's just a smaller constant for O(n).

Incremental median computation with max memory efficiency

Tags:

Mau

2 Answers

deinst

Stephen

Recent Activity

Donate For Us

Incremental median computation with max memory efficiency

Tags:

Mau

2 Answers

deinst

Stephen

Related questions

Recent Activity

Donate For Us