Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for Fast Median Update

Suppose that at a point in time, you have a collection of N numbers and know the median element: M. Now, you're given a new value, X, and so you may need to update M. (Or rather, you will need to, assuming the numbers you're dealing with are all unique. Also, all the samples are received serially, so there's no issues with concurrency.)

Calculating the new mean is straightforward: take the old mean, add X, multiply by N, and divide by N + 1. (This is clear from inspecting how the average of N elements is defined. I'm not too worried about numerics, for the moment.)

My question is: can anyone suggest a creative/novel (or perhaps, provably-optimal) approach the problem of updating the median? I'll provide an example (simple idea of my own design) below, with a bit of analysis:

In this sample, I'll use a std::forward_list, since C++11 is where I encountered this most recently. Without loss of generality, I'm going to assume you're going about this the right way: maintaining an ordered list of the elements (type T's) encountered thus far, std::forward_list<T> sorted; When T x; appears, simply fold it into place using:

sorted.merge(std::forward_list<T> {{ x }});

By the way, I'm curious if anyone has a better (more efficient/elegant) method for this. Gripes welcome.

So, X is now part of sorted, and here's my idea, in a nutshell:

auto it = sorted.begin(), itend = sorted.end();
typename std::forward_list<T>::size_type count = std::distance(it, itend);
for (const auto &e : sorted) {
    if (it == itend || ++it == itend) {
        M = (count % 2) ? e : (e + M) / 2;
        break;
    } else { ++it; }
}

The nice (if not somewhat hard to see) thing that happens here is that: since you move the iterator forward twice (and safely, I might add, though at the price of two comparisons) for each element, when end() is reached, we'll be at the proper (median) value. If there's an odd number of elements, M is just that sample, if not, it's just the average of this element and the old (pushed-out) median. Because odd and even numbers alternate, either the old or new M will actually be in the collection. This reasoning is sound, yes?

You needn't comment on my O(3n) method if you think it's trash/yours is much better; I'm just suggesting it as a starting point.

like image 845
hagemt Avatar asked Dec 12 '22 14:12

hagemt


2 Answers

You can split your array int two heap trees, of equal size, I of least part or array, and S is of greatest part, and their tops contain maximum and minimum element. Say array 1, 2, 4, 4, 5, 5, 7, 8, 8, 8 organised like this:

 1 4
 \ /
  4   2
   \ /
    5  <--- I's top

    5  <--- S's top
   / \
  7   8
 / \
 8 8

Note, if number of elements is even, then median = top(S)+top(I), and if odd, then one of heaps thould be one element bigger than other, and median is on top of bigger one.

When this is done then updating median is simple, you should add your element to one of heaps and exchange their tops if top(S) became less than top(I).

like image 182
Vovanium Avatar answered Dec 14 '22 03:12

Vovanium


You could use a std::set, and the fact that insertions to sets won't invalidate iterators.

You can hold an iterator mIt to the set's median element if N is odd, and to the left of the two median elements, if N is even.

Let's consider the different cases you can have when you insert elements:

Insertion when N is odd: if the inserted element is smaller than *mIt, old median becomes right of the two new median elements, so decrement the iterator. If it's bigger (or equal, for multiset), all is well.
Insertion when N is even: if the inserted element is bigger (or equal) than *mIt, the old right median becomes median, so increment the iterator. If it's smaller, the old left median becomes median, and all is well.

template <class T>
class MedianHolder {
  std::set<T> elements;
  std::set<T>::const_iterator mIt;

public:
  T const& getMedian() const { return *mIt; }

  void insert(T const& t) {
    if (elements.empty()) {
      mIt = elements.insert(t).first;
      return;
    }

    bool smaller = std::less<T>(t,getMedian());
    bool odd = (elements.size() % 2) == 1;

    if (!elements.insert(t).second)
      return; //not inserted

    if (odd && smaller) --mIt;
    else if (!odd && !smaller) ++mIt;
  }
};

I'll leave erasing elements as an exercise to you ;-)

like image 31
Arne Mertz Avatar answered Dec 14 '22 04:12

Arne Mertz