I observed a surprising behavior when testing simple arithmetic operations at limit cases, on an x86 architecture:
const double max = 9.9e307; // Near std::numeric_limits<double>::max()
const double init[] = { max, max, max };
const valarray<double> myvalarray(init, 3);
const double mysum = myvalarray.sum();
cout << "Sum is " << mysum << endl; // Sum is 1.#INF
const double myavg1 = mysum/myvalarray.size();
cout << "Average (1) is " << myavg1 << endl; // Average (1) is 1.#INF
const double myavg2 = myvalarray.sum()/myvalarray.size();
cout << "Average (2) is " << myavg2 << endl; // Average (2) is 9.9e+307
(Tested with MSVC in release mode, as well as gcc through Codepad.org. MSVC's debug mode sets average (2) to #INF
.)
I expected average (2) to be equal to average (1), but it seems to me the C++ built-in division operator got optimized by the compiler and somehow prevented the accumulation to reach #INF
.
In short: The average of big numbers doesn't yields #INF
.
I observed the same behavior with an std algorithm on MSVC:
const double mysum = accumulate(init, init+3, 0.);
cout << "Sum is " << mysum << endl; // Sum is 1.#INF
const double myavg1 = mysum/static_cast<size_t>(3);
cout << "Average (1) is " << myavg1 << endl; // Average (1) is 1.#INF
const double myavg2 = accumulate(init, init+3, 0.)/static_cast<size_t>(3);
cout << "Average (2) is " << myavg2 << endl; // Average (2) is 9.9e+307
(This time however, gcc set average (2) to #INF
: http://codepad.org/C5CTEYHj.)
Thanks
Arithmetic operations on floating point numbers consist of addition, subtraction, multiplication and division. The operations are done with algorithms similar to those used on sign magnitude integers (because of the similarity of representation) — example, only add numbers of the same sign.
Floating-point operations involve floating-point numbers and typically take longer to execute than simple binary integer operations. For this reason, most embedded applications avoid wide-spread usage of floating-point math in favor of faster, smaller integer operations.
Store the value in a higher-precision variable. E.g., instead of float step , use double step . In this case the value you've calculated won't be rounded once more, so precision will be higher.
One method to reduce high floating-point errors is to use higher precision to perform floating- point calculation of the original program. For example, one may replace a 32-bit single precision with a 64-bit double precision to improve the accuracy of results.
Just a guess, but: It may be that Average (2) is computed directly in the floating point registers, which have a width of 80 bits and overflow later than the 64 bit storage for doubles in memory. You should check the disassembly for your code to see if that is indeed the case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With