I build a paralle sum code to sum a large number of float numbers then I found when the number of figures is bigger than 100000000, the result will go wrong. Then I build a serial code to compare. The serial code also get wrong number. Anybody knows why? Thanks!
my simple code is as follows.
the result is " 1.67772e+007". It should be 1e+008
int main()
{
size_t N=100000000;
cout<<"n is : "<<N<<endl;
clock_t start = clock();
task_scheduler_init init;
vector<float> myvec;
vector<float>* pvec;
for(int i=0;i<N;i++)
myvec.push_back(1.0f);
pvec=&myvec;
float mysum;
mysum=parallelSum(pvec);
cout<<" the p sum is: "<<mysum<<endl;
clock_t finish = clock();
cout<<"Time Used = "<<(finish - start)/CLOCKS_PER_SEC<<endl;
mysum=0;
for(int i=0;i<N;i++)
mysum+=myvec[i];
cout<<" the s sum is: "<<mysum<<endl;
return 0;
}
Add the fractional and integer part of the two numbers separately and forward the final carry part of fractional addition to integers part. Concatenate the digits stored for integer and fractional part with a decimal '. ' to get the required sum two large floating point numbers.
Python sum of floats Output: 7.0 If you want to add floating point values with extended precision, you can use math. fsum() function.
Your problem is due to the limited available precision of floating point numbers.
While
1.0f + 1.0f == 2.0f,
You will find that
16777216.0f + 1.0f == 16777216.0f
The extra 1.0f is just thrown away since 16777217 cannot be represented in float
format.
Looking at your result – 1.67772e+007 – it's clear that this is exactly what has happened.
Your expected result, 100000000.0, is quite a lot (6x) bigger than 16777216.0f, but once the sum reaches a total of 16777216.0f it stays there for the remaining 8327884 additions of 1.0f.
Solution: Try using double
instead of float
, which goes up to 9007199254740992.0
before hitting this problem.
In single precision floating point there are only 24 bits of precision available, and 2^24 is 16777216. There is no way to encode 16777217 in 24 bits, so it simply stays at 16777216, on the reasoning that it's close enough to the real answer. The real problem arises when you add lots of very small numbers to a big number, where the sum of the small numbers is signifiant relative to the big one, but individually they are not.
Note that 16777216.0f is not the largest number that can be represented in
float
, but just represents the limit of precision. For example, 16777216.0f x 2^4 + 2^4 => 16777216.0f x 2^4
double
has 53 bits of precision, so can encode up to 2^53, or 9007199254740992.0
before hitting the point where adding 1.0d
fails.
This issue also represents another hazard for parallelising floating point operations - floating point addition is not associative, in other words, your sequential algorithm:
Sum(A) = (...((((A1 + A2) + A3) + A4) ... A10000)
May produce a different result from the parallelised version:
Sum(A) = (...((((A1 + A2) + A3) + A4) ... A1000)
+ (...((((A1001 + A1002) + A1003) + A1004) ... A2000)
+ (...((((A2001 + A2002) + A2003) + A2004) ... A3000)
...
+ (...((((A9001 + A9002) + A9003) + A9004) ... A10000)
since any given step may lose precision to a different degree.
This does not mean that either is more right, but that you may get unexpected results.
If you really have to use float
, try the following:
Note that this changes your algorithmic complexity from an O(N) operation to an O(N log N) operation, but it's rather more likely to produce the correct number. It's quite parallelisable. You may be able to merge the sort and sum operations if you are clever about it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With