In my c++ app i have a vector of doubles in the range (0,1) and i have to calculate its total as accurately as possible. It feels like this issue should have been addressed before, but i cant find anything.
Obviously iterating through each item on the vector and doing sum+=vect[i] accumulates a significant error if the vector size is large and there are items which are significantly smaller then the others.
My current solution is this function:
double sumDoubles(vector<double> arg)// pass by copy
{
sort(arg.rbegin(),arg.rend()); // sort in reverse order
for(int i=1;i<=arg.size();i*=2)
for(int j=0;j<arg.size()-i;j+=(2*i))
arg[j]+=arg[j+i];
return arg[0];
}
Basically it sorts the input in ascending order and calculates pairwise sums:
a+b+c+d+e+f+g+h=((a+b)+(c+d))+((e+f)+(g+h))
Like constructing a binary tree, but doing it in place. Sorting should ensure that at each step the two summands are of comparable magnitude.
The code above does perform better than a single loop with accumulative sum. However i am curious if it is possible to increase precision further while not degrading performance too much.
One method to reduce high floating-point errors is to use higher precision to perform floating- point calculation of the original program. For example, one may replace a 32-bit single precision with a 64-bit double precision to improve the accuracy of results.
Accuracy in floating point representation is governed by number of significant bits, whereas range is limited by exponent. Not all real numbers can exactly be represented in floating point format.
It's a problem caused when the internal representation of floating-point numbers, which uses a fixed number of binary digits to represent a decimal number. It is difficult to represent some decimal number in binary, so in many cases, it leads to small roundoff errors.
One of the standard ways of approaching this issue is the Kahan summation algorithm. This algorithm reduces your worst-case error to being dependent upon your floating point precision rather than growing proportional to the length of your vector (and does it in O(n) time, albeit with more calculations per iteration).
Kahan summation will likely outperform your current sumDoubles
due to your sorting every call, and will additionally improve pairwise summation's error growth of O(log n) to O(1). This said, if the sort
is unnecessary, pairwise summation will likely outperform Kahan summation (due to the additional per-iteration math involved) with what may be (for your circumstances) minimal error growth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With