I build a paralle sum code to sum a large number of float numbers then I found when the number of figures is bigger than 100000000, the result will go wrong. Then I build a serial code to compare. The serial code also get wrong number. Anybody knows why? Thanks! my simple code is as follows. the result is " 1.67772e+007". It should be 1e+008 <pre class="prettyprint"><code>int main() { size_t N=100000000; cout<<"n is : "<<N<<endl; clock_t start = clock(); task_scheduler_init init; vector<float> myvec; vector<float>* pvec; for(int i=0;i<N;i++) myvec.push_back(1.0f); pvec=&myvec; float mysum; mysum=parallelSum(pvec); cout<<" the p sum is: "<<mysum<<endl; clock_t finish = clock(); cout<<"Time Used = "<<(finish - start)/CLOCKS_PER_SEC<<endl; mysum=0; for(int i=0;i<N;i++) mysum+=myvec[i]; cout<<" the s sum is: "<<mysum<<endl; return 0; } </code></pre>

Your problem is due to the limited available precision of floating point numbers. While <pre class="prettyprint"><code>1.0f + 1.0f == 2.0f, </code></pre> You will find that <pre class="prettyprint"><code>16777216.0f + 1.0f == 16777216.0f </code></pre> The extra 1.0f is just thrown away since 16777217 cannot be represented in <code>float</code> format. Looking at your result – 1.67772e+007 – it's clear that this is exactly what has happened. Your expected result, 100000000.0, is quite a lot (6x) bigger than 16777216.0f, but once the sum reaches a total of 16777216.0f it stays there for the remaining 8327884 additions of 1.0f. Solution: Try using <code>double</code> instead of <code>float</code>, which goes up to <code>9007199254740992.0</code> before hitting this problem. <h3>Why?</h3> In single precision floating point there are only 24 bits of precision available, and 2^24 is 16777216. There is no way to encode 16777217 in 24 bits, so it simply stays at 16777216, on the reasoning that it's close enough to the real answer. The real problem arises when you add lots of very small numbers to a big number, where the sum of the small numbers is signifiant relative to the big one, but individually they are not. <blockquote> Note that 16777216.0f is not the largest number that can be represented in <code>float</code>, but just represents the limit of precision. For example, 16777216.0f x 2^4 + 2^4 => 16777216.0f x 2^4 </blockquote> <code>double</code> has 53 bits of precision, so can encode up to 2^53, or <code>9007199254740992.0</code> before hitting the point where adding <code>1.0d</code> fails. <hr> This issue also represents another hazard for parallelising floating point operations - floating point addition is not associative, in other words, your sequential algorithm: <pre class="prettyprint"><code>Sum(A) = (...((((A1 + A2) + A3) + A4) ... A10000) </code></pre> May produce a different result from the parallelised version: <pre class="prettyprint"><code>Sum(A) = (...((((A1 + A2) + A3) + A4) ... A1000) + (...((((A1001 + A1002) + A1003) + A1004) ... A2000) + (...((((A2001 + A2002) + A2003) + A2004) ... A3000) ... + (...((((A9001 + A9002) + A9003) + A9004) ... A10000) </code></pre> since any given step may lose precision to a different degree. This does not mean that either is more right, but that you may get unexpected results. <hr> If you really have to use <code>float</code>, try the following: <ul> <li>sort your numbers from most negative to most positive (order (N log N))</li> <li>add each pair in turn: B1 := A1 + A2, B2 := A3 + A4, B3 := A5 + A6 this produces a new list B, half the length of the initial one</li> <li>repeat this procedure on B to get C, C to get D, etc</li> <li>stop when there is only one number left.</li> </ul> Note that this changes your algorithmic complexity from an O(N) operation to an O(N log N) operation, but it's rather more likely to produce the correct number. It's quite parallelisable. You may be able to merge the sort and sum operations if you are clever about it.

how to sum a large number of float number?

I build a paralle sum code to sum a large number of float numbers then I found when the number of figures is bigger than 100000000, the result will go wrong. Then I build a serial code to compare. The serial code also get wrong number. Anybody knows why? Thanks!

my simple code is as follows.

the result is " 1.67772e+007". It should be 1e+008

int main()
{
    size_t N=100000000;
    cout<<"n is : "<<N<<endl;
    clock_t start = clock();
    task_scheduler_init init;
    vector<float> myvec;
    vector<float>* pvec;
    for(int i=0;i<N;i++)
        myvec.push_back(1.0f);
    pvec=&myvec;
    float mysum;
    mysum=parallelSum(pvec);
    cout<<" the p sum is: "<<mysum<<endl;
    clock_t finish = clock();
        cout<<"Time Used  = "<<(finish - start)/CLOCKS_PER_SEC<<endl;
        mysum=0;
       for(int i=0;i<N;i++)
    mysum+=myvec[i];
        cout<<" the s sum is: "<<mysum<<endl;
    return 0;
}

How do you sum a floating point number?

Add the fractional and integer part of the two numbers separately and forward the final carry part of fractional addition to integers part. Concatenate the digits stored for integer and fractional part with a decimal '. ' to get the required sum two large floating point numbers.

Can you use sum on floats in Python?

Python sum of floats Output: 7.0 If you want to add floating point values with extended precision, you can use math. fsum() function.

Your problem is due to the limited available precision of floating point numbers.

While

1.0f + 1.0f == 2.0f,

You will find that

16777216.0f + 1.0f == 16777216.0f

The extra 1.0f is just thrown away since 16777217 cannot be represented in float format.

Looking at your result – 1.67772e+007 – it's clear that this is exactly what has happened.

Your expected result, 100000000.0, is quite a lot (6x) bigger than 16777216.0f, but once the sum reaches a total of 16777216.0f it stays there for the remaining 8327884 additions of 1.0f.

Solution: Try using double instead of float, which goes up to 9007199254740992.0 before hitting this problem.

Why?

In single precision floating point there are only 24 bits of precision available, and 2^24 is 16777216. There is no way to encode 16777217 in 24 bits, so it simply stays at 16777216, on the reasoning that it's close enough to the real answer. The real problem arises when you add lots of very small numbers to a big number, where the sum of the small numbers is signifiant relative to the big one, but individually they are not.

Note that 16777216.0f is not the largest number that can be represented in float, but just represents the limit of precision. For example, 16777216.0f x 2^4 + 2^4 => 16777216.0f x 2^4

double has 53 bits of precision, so can encode up to 2^53, or 9007199254740992.0 before hitting the point where adding 1.0d fails.

This issue also represents another hazard for parallelising floating point operations - floating point addition is not associative, in other words, your sequential algorithm:

Sum(A) = (...((((A1 + A2) + A3) + A4) ... A10000)

May produce a different result from the parallelised version:

Sum(A) = (...((((A1 + A2) + A3) + A4) ... A1000)
       + (...((((A1001 + A1002) + A1003) + A1004) ... A2000)
       + (...((((A2001 + A2002) + A2003) + A2004) ... A3000)
       ...
       + (...((((A9001 + A9002) + A9003) + A9004) ... A10000)

since any given step may lose precision to a different degree.

This does not mean that either is more right, but that you may get unexpected results.

If you really have to use float, try the following:

sort your numbers from most negative to most positive (order (N log N))
add each pair in turn: B1 := A1 + A2, B2 := A3 + A4, B3 := A5 + A6 this produces a new list B, half the length of the initial one
repeat this procedure on B to get C, C to get D, etc
stop when there is only one number left.

Note that this changes your algorithmic complexity from an O(N) operation to an O(N log N) operation, but it's rather more likely to produce the correct number. It's quite parallelisable. You may be able to merge the sort and sum operations if you are clever about it.

how to sum a large number of float number?

Tags:

c++

Oldfish

People also ask

1 Answers

Why?

Alex Brown

Recent Activity

Donate For Us

how to sum a large number of float number?

Tags:

c++

Oldfish

People also ask

1 Answers

Why?

Alex Brown

Related questions

Recent Activity

Donate For Us