Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute mean (average) robustly?

If we compute the mean naively:

std::vector<double> values;
double sum = std::accumulate(begin(values), end(values), 0.0);
double mean = sum / values.size();

and values.size() is big, we can get inaccurate results, since the floating point numbers have less resolution in higher ranges. Or worse, if I understand it correctly, we can get an infinite result.

When we have an even number of values, we can compute the mean of the first half, then the second and find the mean of these two means.

This doesn't seem to be a new problem, but I have trouble finding resources. I think there are more sophisticated techniques with trade-offs in

  • robustness
  • computational complexity
  • difficulty to implement

and I wonder if someone has summarized them somewhere or even better if they are available in some library.

like image 279
Martin Drozdik Avatar asked May 22 '14 17:05

Martin Drozdik


1 Answers

You can use an online algorithm as described here.

Basically (in pythonish pseudo-code):

n = 0
mean = 0

for value in data:
    n += 1
    mean += (value - mean)/n

This algorithm is more numerically stable than the naïve implementation.

like image 162
Juan Lopes Avatar answered Oct 21 '22 08:10

Juan Lopes