Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a good way to add a large number of small floats together?

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this

result = 0.0;
for (i = 0; i < 100000000; i++) {
    result += array[i];
}

you'd run into problems as result gets much larger than 1.0.

So what are some of the ways to more accurately perform the summation?

like image 420
splicer Avatar asked Mar 16 '10 16:03

splicer


People also ask

Can you do calculations with float?

Floats are critical in most mathematical programs as they allow calculations to be far more precise than when we use integers. For instance, imagine calculating the area of a circle using only integers.

How do you add a float in Python?

Python sum of floats Output: 7.0 If you want to add floating point values with extended precision, you can use math. fsum() function.

How can floating point errors be reduced?

One method to reduce high floating-point errors is to use higher precision to perform floating- point calculation of the original program. For example, one may replace a 32-bit single precision with a 64-bit double precision to improve the accuracy of results.


2 Answers

Sounds like you want to use Kahan Summation.

According to Wikipedia,

The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).

In pseudocode, the algorithm is:

function kahanSum(input)
 var sum = input[1]
 var c = 0.0          //A running compensation for lost low-order bits.
 for i = 2 to input.length
  y = input[i] - c    //So far, so good: c is zero.
  t = sum + y         //Alas, sum is big, y small, so low-order digits of y are lost.
  c = (t - sum) - y   //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
  sum = t             //Algebraically, c should always be zero. Beware eagerly optimising compilers!
 next i               //Next time around, the lost low part will be added to y in a fresh attempt.
return sum
like image 126
Daniel Pryden Avatar answered Sep 23 '22 05:09

Daniel Pryden


Make result a double, assuming C or C++.

like image 36
Tuomas Pelkonen Avatar answered Sep 19 '22 05:09

Tuomas Pelkonen