I had a small WTF moment this morning. Ths WTF can be summarized with this:
float x = 0.2f;
float y = 0.1f;
float z = x + y;
assert(z == x + y); //This assert is triggered! (Atleast with visual studio 2008)
The reason seems to be that the expression x + y
is promoted to double and compared with the truncated version in z
. (If i change z
to double
the assert isn't triggered).
I can see that for precision reasons it would make sense to perform all floating point arithmetics in double precision before converting the result to single precision. I found the following paragraph in the standard (which I guess I sort of already knew, but not in this context):
4.6.1.
"An rvalue of type float
can be converted to an rvalue of type double
. The value is unchanged"
My question is, is x + y
guaranteed to be promoted to double or is at the compiler's discretion?
UPDATE: Since many people has claimed that one shouldn't use ==
for floating point, I just wanted to state that in the specific case I'm working with, an exact comparison is justified.
Floating point comparision is tricky, here's an interesting link on the subject which I think hasn't been mentioned.
Using the floating point promotion rules, a value of type float can be converted to a value of type double . In the second call to printDouble() , the float literal 4.0f is promoted into a double , so that the type of argument matches the type of the function parameter.
Double is the default decimal point type for Java. double dnum = 2.344; If high precision is not required and the program only needs a huge array of decimal numbers to be stored, float is a cost-effective way of storing data and saves memory.
Though both Java float vs Double is approximate types, use double if you need more precise and accurate results. Use float if you have memory constraint because it takes almost half as much space as double. If your numbers cannot fit in the range offered by float, then use double.
Size: Float is of size 32 bits while double is of size 64 bits. Hence, double can handle much bigger fractional numbers than float. They differ in the allocation of bits for the representation of the number. Both float and double use 1 bit for representing the sign of the number.
You can't generally assume that ==
will work as expected for floating point types. Compare rounded values or use constructs like abs(a-b) < tolerance
instead.
Promotion is entirely at the compiler's discretion (and will depend on target hardware, optimisation level, etc).
What's going on in this particular case is almost certainly that values are stored in FPU registers at a higher precision than in memory - in general, modern FPU hardware works with double or higher precision internally whatever precision the programmer asked for, with the compiler generating code to make the appropriate conversions when values are stored to memory; in an unoptimised build, the result of x+y
is still in a register at the point the comparison is made but z
will have been stored out to memory and fetched back, and thus truncated to float precision.
The Working draft for the next standard C++0x section 5 point 11 says
The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby
So at the compiler's discretion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With