Assume we have a long array of doubles, say, N == 1000000
.
array<double, N> arr;
There are two naive approaches to compute the average. First
double result = 0;
for (double x : arr) {
result += x;
}
result /= arr.size();
This may be inaccurate when the sum of values is very big. Floating point numbers lose precision then.
Another approach is:
double result = 0;
for (double x : arr) {
result += x / arr.size();
}
This may lose precision when the numbers are small.
Is there any fail-safe way to calculate a simple average of floating point numbers? Solutions, which use only the standard library are appreciated.
Average This is the arithmetic mean, and is calculated by adding a group of numbers and then dividing by the count of those numbers. For example, the average of 2, 3, 3, 5, 7, and 10 is 30 divided by 6, which is 5. Median The middle number of a group of numbers.
Find the average or mean by adding up all the numbers and dividing by how many numbers are in the set.
Average is the sum of array elements divided by the number of elements. Examples : Input : arr[] = {1, 2, 3, 4, 5} Output : 3 Sum of the elements is 1+2+3+4+5 = 15 and total number of elements is 5.
This simple formula works fine, as long as the numbers to average are not negative. The simplest and most robust way to ignore errors when calculating an average is to use the AGGREGATE function.
Compute average of two numbers without overflow. Given two numbers, a and b. Compute the average of the two numbers. The well know formula (a + b) / 2 may fail at the following case : If, a = b = (2^31) – 1; i.e. INT_MAX. Now, (a+b) will cause overflow and hence formula (a + b) / 2 wont work.
If you have a large number of values to average (which is the only case in which you would have the problem that the sum overflows a double), then this algorithm will have severe underflow issues. Essentially, at some point, (x-avg) becomes zero. – Martin B
Since all input numbers fit into double range, the mean will also fit into double range, thus a solution using doubles only is possible. – akuhn Dec 19, 2009 at 21:47 Of course. I did propose using the better approaches already suggested. (Some of those objects should be garbage-collected at some point.) – Bozho Dec 19, 2009 at 22:46
If you want to squeeze more accuracy out of doubles, you can use Kahan summation and finally division by number of elements. There is however no standard library implementation of Kahan summation I know of.
An easy, standard way (almost like cheating) would of course be calculation using long doubles, basically using your first implementation and only converting the result back to double precision.
The so-called naive ways are not naive. What do the data mean, and how accurately can you measure those values? Unless the answer is something very unusual, the simple method with doubles is fine. However floats are a bit under-powered for general use.
If you add the small absolute values first you might get an extra bit or so of precision. That requires a sort. If the data are all above a certain threshold, subtracting the minimum may also give you another bit.
You can also store a partial total, and a partial mean, and check at each stage that partial mean * number processed is within a certain tolerance of the partial total. That won't give you any extra accuracy, but it will tell you if the fpu is too inaccurate for your purposes.
You can also use long double, or even code your own extended-precision floating point library (or use someone else's). However the solutions get increasingly heroic.
One way to reduce loss of precision would be to sort the doubles and then add them together in sorted order, starting with the smallest values and then at the end divide the final sum by the number of doubles.
So the tools you need would be std::sort
and std::accumulate
and plain old division /
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With