Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summing array of doubles with large value span : proper algorithm

I have an algorithm where I need to sum (a lot of time) double numbers ranging in the e-40 to the e+40.

Array Example (randomly dumped from real application):

-2.06991e-05 
7.58132e-06 
-3.91367e-06 
7.38921e-07 
-5.33143e-09
-4.13195e-11 
4.01724e-14 
6.03221e-17 
-4.4202e-20
6.58873 
-1.22257
-0.0606178 
0.00036508 
2.67599e-07 
0
-627.061
-59.048 
5.92985 
0.0885884
0.000276455 
-2.02579e-07

It goes without saying the I am aware of the rounding effect this will cause, I am trying to keep it under control : the final result should not have any missing information in the fractional part of the double or, if not avoidable result should be at least n-digit accurate (with n defined). End result needs something like 5 digits plus exponent.

After some decent thinking, I ended up with following algorithm :

  • Sort the array so that the largest absolute value comes first, closest to zero last.
  • Add everything in a loop

The idea is that in this case, any cancellation of large values (negatives and positive) will not impact latter smaller values. In short :

  • (10e40 - 10e40) + 1 = 1 : result is as expected
  • (1 + 10e-40) - 10e40 = 0 : not good

I ended up using std::multiset (benchmark on my PC gave 20% higher speed with long double compared to normal doubles - I am fine with doubles resolution) with a custom sort function using std:fabs.

It's still quite slow (it takes 5 seconds to do the whole thing) and I still have this feeling of "you missed something in your algo". Any recommandation :

  1. for speed optimization. Is there a better way to sort the intermediate products ? Sorting a set of 40 intermediate results (typically) takes about 70% of the total execution time.
  2. for missed issues. Is there a chance to still lose critical data (one that should have been in the fractional part of the final result) ?

On a bigger picture, I am implementing real coefficient polynomial classes of pure imaginary variable (electrical impedances : Z(jw)). Z is a big polynom representing a user defined system, with coefficient exponent ranging very far.
The "big" comes from adding things like Zc1 = 1/jC1w to Zc2 = 1/jC2w :
Zc1 + Zc2 = (C1C2(jw)^2 + 0(jw))/(C1+C2)(jw)

In this case, with C1 and C2 in nanofarad (10e-9), C1C2 is already in 10e-18 (and it only started...)

my sort function use a manhattan distance of complex variables (because, mine are either pure real or pure imaginary) :

struct manhattan_complex_distance
{
        bool operator() (std::complex<long double> a, std::complex<long double> b)
        {
            return std::fabs(std::real(a) + std::imag(a)) > std::fabs(std::real(b) + std::imag(b));
        }
};

and my multi set in action :

std:complex<long double> get_value(std::vector<std::complex<long double>>& frequency_vector)
{
    //frequency_vector is precalculated once for all to have at index n the value (jw)^n. 
    std::multiset<std::complex<long double>, manhattan_distance> temp_list;   
    for (int i=0; i<m_coeficients.size(); ++i)
    {
        //   element of :       ℝ         *         ℂ
        temp_list.insert(m_coeficients[i] * frequency_vector[i]);
    }
    std::complex<long double> ret=0;
    for (auto i:temp_list)
    {
        // it is VERY important to start adding the big values before adding the small ones.
        // in informatics, 10^60 - 10^60 + 1 = 1; while 1 + 10^60 - 10^60 = 0. Of course you'd expected to get 1, not 0.
        ret += i;
    }
    return ret;
}

The project I have is c++11 enabled (mainly for improvement of the math lib and complex number tools)

ps : I refactored the code to make is easy to read, in reality all complexes and long double names are template : I can change the polynomial type in no time or use the class for regular polynomial of ℝ

like image 724
Mr Buisson Avatar asked Jun 21 '16 16:06

Mr Buisson


3 Answers

As GuyGreer suggested, you can use Kahan summation:

double sum = 0.0;
double c = 0.0;
for (double value : values) {
    double y = value - c;
    double t = sum + y;
    c = (t - sum) - y;
    sum = t;
}

EDIT: You should also consider using Horner's method to evaluate the polynomial.

double value = coeffs[degree];
for (auto i = degree; i-- > 0;) {
    value *= x;
    value += coeffs[i];
}
like image 58
Tavian Barnes Avatar answered Sep 21 '22 06:09

Tavian Barnes


Sorting the data is on the right track. But you definitely should be summing from smallest magnitude to largest, not from largest to smallest. Summing from largest to smallest, by the time you get to the smallest, aligning the next value with the current sum is liable to cause most or all of the bits of the next value to 'fall off the end'. Summing instead from smallest to largest, the smallest values get a chance to accumulate a decent-sized sum, for which more bits will get into the largest. Combined with Kahan summation, that should yield a fairly accurate sum.

like image 32
PMar Avatar answered Sep 18 '22 06:09

PMar


First: have your math keep track of error. Replace your doubles with error-aware types, and when you add or multiply together two doubles it also calculates the maximium error.

This is about the only way you can guarantee that your code produces accurate results while being reasonably fast.

Second, don't use a multiset. The associative containers are not for sorting, they are for maintaining a sorted collection, while being able to incrementally add or remove elements from it efficiently.

The ability to add/remove elements incrementally means it is node-based, and node-based means it is slow in general.

If you simply want a sorted collection, start with a vector then std::sort it.

Next, to minimize error, keep a list of positive and negative elements. Start with zero as your sum. Now pick the smallest of either the positive or negative elements such that the total of your sum and that element is closest to zero.

Do so with elements that calculate their error bounds.

At the end, determine if you have 5 digits of precision, or not.

These error-propogating doubles should be ideally used as early on in the algorithm as possible.

like image 31
Yakk - Adam Nevraumont Avatar answered Sep 19 '22 06:09

Yakk - Adam Nevraumont