Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted median computation

Tags:

c++

algorithm

I'm looking for good study material about computation of weighted median algorithm and/or sample code in C++. The weights of my median are values between 0 and 1. Could you recommend me some links?

like image 734
Viper Avatar asked Mar 20 '12 20:03

Viper


People also ask

What is weighted median in statistics?

1 The Weighted Median. The weighted median is an even better measure of central tendency than the plain median. It is also more “set-oriented” than the plain median. It factors in the number of times the two values in the middle subset of a table with an even number of rows appear.

Can you calculate a weighted median in Excel?

Real Statistics Function: The weighted median can also be calculated using the function MED(R1, R2) where R1 contains the elements in S and R2 contains the elements in W. If R2 is omitted then the ordinary median is returned, i.e. MEDIAN(R1).

What is weighting formula?

What Is the Weight Formula? The weight of an object is the product of its mass and acceleration due to gravity. The basic formulas to find the weight is: W = mg (Newton) where, W is the weight of the object in Newton.

How do you find the weighted median of an array?

Calculate the sum of all weights and divide by 2 to get the target sum. Choose a pivot and partition the array into larger and smaller elements. Sum the weights in the smaller partition, and subtract from the total to get the sum in the other partition.


2 Answers

The weighted median is defined like so:

If x is a sorted array of N elements, and w is the array of weights with a total weight W, then the weighted median is the last x[i] such that the sum of w[i] and of all previous weights are less than or equal to S/2.

In C++, this can be expressed like so (assuming x, w and W are defined as above)

double sum = 0;
int i;
for(i = 0; i < N; ++i)
{
    sum += w[i];
    if(sum > W/2)
        break;
}
double median = x[i-1];

EDIT

So it seems I answered this question too hastily and made some mistakes. I found a neat description of weighted median from the R documentation, which describes it like so:

For the n elements x = c(x[1], x[2], ..., x[n]) with positive weights w = c(w[1], w[2], ..., w[n]) such that sum(w) = S, the weighted median is defined as the element x[k] for which initial the total weight of all elements x[i] < x[k] is less or equal to S/2 and for which the total weight of all elements x[i] > x[k] is less or equal to S/2.

From this description, we have a pretty straight-forward implementation of the algorithm. If we start with k == 0, then there are no elements before x[k], so the total weight of elements x[i] < x[k] will be less than S/2. Depending on the data, the total weight of the elements x[i] > x[k] may or may not be less than S/2. So we can just move forward through the array until this second sum becomes less than or equal to S/2:

#include <cstddef>
#include <numeric>
#include <iostream>

int main()
{
  std::size_t const N = 5;
  double x[N] = {0, 1, 2, 3, 4};
  double w[N] = {.1, .2, .3, .4, .5};

  double S = std::accumulate(w, w+N, 0.0); // the total weight

  int k = 0;
  double sum = S - w[0]; // sum is the total weight of all `x[i] > x[k]`

  while(sum > S/2)
  {
    ++k;
    sum -= w[k];
  }

  std::cout << x[k] << std::endl;
}

Note that if the median is the last element (medianIndex == N-1), then sum == 0, so the condition sum > S/2 fails. Thus, k will never be out of bounds (unless N == 0!). Also, if there are two elements that satisfy the condition, the algorithm always selects the first one.

like image 63
Ken Wayne VanderLinde Avatar answered Sep 27 '22 19:09

Ken Wayne VanderLinde


Here is an implementation of the weighted median for initially unsorted vectors. It builds on the very good answer by @Ken Wayne VanderLinde for the median calculation, and on the index sorter given in this thread.

    template <typename VectorType>
    auto sort_indexes(VectorType const& v)
    {
        std::vector<int> idx(v.size());
        std::iota(std::begin(idx), std::end(idx), 0);

        std::sort(std::begin(idx), std::end(idx), [&v](int i1, int i2) {return v[i1] < v[i2];});

        return idx;
    }

    template<typename VectorType1, typename VectorType2>
    auto weightedMedian(VectorType1 const& x, VectorType2 const& weight)
    {
        double totalWeight = 0.0;
        for (int i = 0; i < static_cast<int>(x.size()); ++i)
        {
            totalWeight += weight[i];
        }

        auto ind = sort_indexes(x);

        int k = ind[0];
        double sum = totalWeight - weight[k];

        for (int i = 1; i < static_cast<int>(ind.size()); ++i)
        {
            k = ind[i];
            sum -= weight[k];

            if (sum <= 0.5 * totalWeight)
            {
                break;
            }
        }
        return x[k];
    }

It works with any vector type which supports operator[](int) and size() (--therefore no use is made of std::accumulate etc.).

like image 39
davidhigh Avatar answered Sep 27 '22 18:09

davidhigh