I'm looking for good study material about computation of weighted median algorithm and/or sample code in C++. The weights of my median are values between 0 and 1. Could you recommend me some links?
1 The Weighted Median. The weighted median is an even better measure of central tendency than the plain median. It is also more “set-oriented” than the plain median. It factors in the number of times the two values in the middle subset of a table with an even number of rows appear.
Real Statistics Function: The weighted median can also be calculated using the function MED(R1, R2) where R1 contains the elements in S and R2 contains the elements in W. If R2 is omitted then the ordinary median is returned, i.e. MEDIAN(R1).
What Is the Weight Formula? The weight of an object is the product of its mass and acceleration due to gravity. The basic formulas to find the weight is: W = mg (Newton) where, W is the weight of the object in Newton.
Calculate the sum of all weights and divide by 2 to get the target sum. Choose a pivot and partition the array into larger and smaller elements. Sum the weights in the smaller partition, and subtract from the total to get the sum in the other partition.
The weighted median is defined like so:
If x
is a sorted array of N
elements, and w
is the array of weights with a total weight W
, then the weighted median is the last x[i]
such that the sum of w[i]
and of all previous weights are less than or equal to S/2
.
In C++, this can be expressed like so (assuming x
, w
and W
are defined as above)
double sum = 0;
int i;
for(i = 0; i < N; ++i)
{
sum += w[i];
if(sum > W/2)
break;
}
double median = x[i-1];
So it seems I answered this question too hastily and made some mistakes. I found a neat description of weighted median from the R documentation, which describes it like so:
For the
n
elementsx = c(x[1], x[2], ..., x[n])
with positive weightsw = c(w[1], w[2], ..., w[n])
such thatsum(w) = S
, the weighted median is defined as the elementx[k]
for which initial the total weight of all elementsx[i] < x[k]
is less or equal toS/2
and for which the total weight of all elementsx[i] > x[k]
is less or equal toS/2
.
From this description, we have a pretty straight-forward implementation of the algorithm. If we start with k == 0
, then there are no elements before x[k]
, so the total weight of elements x[i] < x[k]
will be less than S/2
. Depending on the data, the total weight of the elements x[i] > x[k]
may or may not be less than S/2
. So we can just move forward through the array until this second sum becomes less than or equal to S/2
:
#include <cstddef>
#include <numeric>
#include <iostream>
int main()
{
std::size_t const N = 5;
double x[N] = {0, 1, 2, 3, 4};
double w[N] = {.1, .2, .3, .4, .5};
double S = std::accumulate(w, w+N, 0.0); // the total weight
int k = 0;
double sum = S - w[0]; // sum is the total weight of all `x[i] > x[k]`
while(sum > S/2)
{
++k;
sum -= w[k];
}
std::cout << x[k] << std::endl;
}
Note that if the median is the last element (medianIndex == N-1
), then sum == 0
, so the condition sum > S/2
fails. Thus, k
will never be out of bounds (unless N == 0
!). Also, if there are two elements that satisfy the condition, the algorithm always selects the first one.
Here is an implementation of the weighted median for initially unsorted vectors. It builds on the very good answer by @Ken Wayne VanderLinde for the median calculation, and on the index sorter given in this thread.
template <typename VectorType>
auto sort_indexes(VectorType const& v)
{
std::vector<int> idx(v.size());
std::iota(std::begin(idx), std::end(idx), 0);
std::sort(std::begin(idx), std::end(idx), [&v](int i1, int i2) {return v[i1] < v[i2];});
return idx;
}
template<typename VectorType1, typename VectorType2>
auto weightedMedian(VectorType1 const& x, VectorType2 const& weight)
{
double totalWeight = 0.0;
for (int i = 0; i < static_cast<int>(x.size()); ++i)
{
totalWeight += weight[i];
}
auto ind = sort_indexes(x);
int k = ind[0];
double sum = totalWeight - weight[k];
for (int i = 1; i < static_cast<int>(ind.size()); ++i)
{
k = ind[i];
sum -= weight[k];
if (sum <= 0.5 * totalWeight)
{
break;
}
}
return x[k];
}
It works with any vector type which supports operator[](int)
and size()
(--therefore no use is made of std::accumulate
etc.).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With