If we compute the mean naively:
std::vector<double> values;
double sum = std::accumulate(begin(values), end(values), 0.0);
double mean = sum / values.size();
and values.size()
is big, we can get inaccurate results, since the floating point numbers have less resolution in higher ranges. Or worse, if I understand it correctly, we can get an infinite result.
When we have an even number of values, we can compute the mean of the first half, then the second and find the mean of these two means.
This doesn't seem to be a new problem, but I have trouble finding resources. I think there are more sophisticated techniques with trade-offs in
and I wonder if someone has summarized them somewhere or even better if they are available in some library.
You can use an online algorithm as described here.
Basically (in pythonish pseudo-code):
n = 0
mean = 0
for value in data:
n += 1
mean += (value - mean)/n
This algorithm is more numerically stable than the naïve implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With