Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is Boost Library's weighted median broken?

I confess that I am no expert in C++.

I am looking for a fast way to compute weighted median, which Boost seemed to have. But it seems I am not able to make it work.

#include <iostream>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/median.hpp>
#include <boost/accumulators/statistics/weighted_median.hpp>
using namespace boost::accumulators;    

int main()
{
  // Define an accumulator set
  accumulator_set<double, stats<tag::median > > acc1;
  accumulator_set<double, stats<tag::median >, float> acc2;

  // push in some data ...
  acc1(0.1);
  acc1(0.2);
  acc1(0.3);
  acc1(0.4);
  acc1(0.5);
  acc1(0.6);

  acc2(0.1, weight=0.);
  acc2(0.2, weight=0.);
  acc2(0.3, weight=0.);
  acc2(0.4, weight=1.);
  acc2(0.5, weight=1.);
  acc2(0.6, weight=1.);

  // Display the results ...
  std::cout << "         Median: " << median(acc1) << std::endl;
  std::cout << "Weighted Median: " << median(acc2) << std::endl;

  return 0;
}

produces the following output, which is clearly wrong.

         Median: 0.3
Weighted Median: 0.3

Am I doing something wrong? Any help will be greatly appreciated.

* however, the weighted sum works correctly *

@glowcoder: The weighted sum works perfectly fine like this.

#include <iostream>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/sum.hpp>
#include <boost/accumulators/statistics/weighted_sum.hpp>
using namespace boost::accumulators;

int main()
{
  // Define an accumulator set
  accumulator_set<double, stats<tag::sum > > acc1;
  accumulator_set<double, stats<tag::sum >, float> acc2;
  // accumulator_set<double, stats<tag::median >, float> acc2;

  // push in some data ...
  acc1(0.1);
  acc1(0.2);
  acc1(0.3);
  acc1(0.4);
  acc1(0.5);
  acc1(0.6);

  acc2(0.1, weight=0.);
  acc2(0.2, weight=0.);
  acc2(0.3, weight=0.);
  acc2(0.4, weight=1.);
  acc2(0.5, weight=1.);
  acc2(0.6, weight=1.);

  // Display the results ...
  std::cout << "         Median: " << sum(acc1) << std::endl;
  std::cout << "Weighted Median: " << sum(acc2) << std::endl;

  return 0;
}

and the result is

         Sum: 2.1
Weighted Sum: 1.5
like image 847
Sang Avatar asked Feb 24 '11 22:02

Sang


3 Answers

The boost function is not broken.

The problem is that you do not provide enough data for the P^2 estimator to work. If you put a loop around your data input such as

for(int i=0;i<100000;i++){
  acc2(0.1, weight=0.);
  acc2(0.2, weight=0.);
  acc2(0.3, weight=0.);
  acc2(0.4, weight=1.);
  acc2(0.5, weight=1.);
  acc2(0.6, weight=1.);
}

you get the correct result of

Median: 0.3
Weighted Median: 0.5

alternatively, you can specify

 accumulator_set<double, 
    stats<tag::weighted_median(with_p_square_cumulative_distribution) >, 
    double> acc2 ( p_square_cumulative_distribution_num_cells = 5 );

which gives Weighted Median: 0.55 as an answer even with only 6 points added as in your question.

like image 192
hannes Avatar answered Nov 14 '22 10:11

hannes


What is weighted median supposed to mean? A median considers only the order of the items, not the content. A weight doesn't change the order (it can change the mean or the sum though). If you used occurence counts (natural integers) instead of floats, you could extend the definition of the median, but I don't think that's what you're trying to do here.

like image 31
Tobu Avatar answered Nov 14 '22 12:11

Tobu


What about:

accumulator_set<double, stats<tag::weighted_median(with_weighted_density) >, float> acc2;
like image 2
chrisaycock Avatar answered Nov 14 '22 11:11

chrisaycock