I tried to check out where float
loses the ability to exactly represent large integer numbers. So I wrote this little snippet:
int main() { for (int i=0; ; i++) { if ((float)i!=i) { return i; } } }
This code seems to work with all compilers, except clang. Clang generates a simple infinite loop. Godbolt.
Is this allowed? If yes, is it a QoI issue?
1. Do not use floating-point numbers if integers will suffice. As a corollary, do not use floating-point numbers if exact computation is required, rather, scale if a fixed number of decimals will always be used. For example, if the values are dollars do the computations in cents and divide by 100 when done.
Floating point numbers are used to represent noninteger fractional numbers and are used in most engineering and technical calculations, for example, 3.256, 2.1, and 0.0036. The most commonly used floating point standard is the IEEE standard.
The data type float has 24 bits of precision. This is equivalent to only about 7 decimal places. (The rest of the 32 bits are used for the sign and size of the number.) The number of places of precision for float is the same no matter what the size of the number.
The main cause of the error in floating point division is the division algorithms used to calculate the quotient. Most computer systems calculate division using multiplication by an inverse, mainly in Z=X/Y , Z = X * (1/Y) .
Note that the built-in operator !=
requires its operands to be of the same type, and will achieve that using promotions and conversions if necessary. In other words, your condition is equivalent to:
(float)i != (float)i
That should never fail, and so the code will eventually overflow i
, giving your program Undefined Behaviour. Any behaviour is therefore possible.
To correctly check what you want to check, you should cast the result back to int
:
if ((int)(float)i != i)
As @Angew pointed out, the !=
operator needs the same type on both sides. (float)i != i
results in promotion of the RHS to float as well, so we have (float)i != (float)i
.
g++ also generates an infinite loop, but it doesn't optimize away the work from inside it. You can see it converts int->float with cvtsi2ss
and does ucomiss xmm0,xmm0
to compare (float)i
with itself. (That was your first clue that your C++ source doesn't mean what you thought it did like @Angew's answer explains.)
x != x
is only true when it's "unordered" because x
was NaN. (INFINITY
compares equal to itself in IEEE math, but NaN doesn't. NAN == NAN
is false, NAN != NAN
is true).
gcc7.4 and older correctly optimizes your code to jnp
as the loop branch (https://godbolt.org/z/fyOhW1) : keep looping as long as the operands to x != x
weren't NaN. (gcc8 and later also checks je
to a break out of the loop, failing to optimize based on the fact that it will always be true for any non-NaN input). x86 FP compares set PF on unordered.
And BTW, that means clang's optimization is also safe: it just has to CSE (float)i != (implicit conversion to float)i
as being the same, and prove that i -> float
is never NaN for the possible range of int
.
(Although given that this loop will hit signed-overflow UB, it's allowed to emit literally any asm it wants, including a ud2
illegal instruction, or an empty infinite loop regardless of what the loop body actually was.) But ignoring the signed-overflow UB, this optimization is still 100% legal.
GCC fails to optimize away the loop body even with -fwrapv
to make signed-integer overflow well-defined (as 2's complement wraparound). https://godbolt.org/z/t9A8t_
Even enabling -fno-trapping-math
doesn't help. (GCC's default is unfortunately to enable-ftrapping-math
even though GCC's implementation of it is broken/buggy.) int->float conversion can cause an FP inexact exception (for numbers too large to be represented exactly), so with exceptions possibly unmasked it's reasonable not to optimize away the loop body. (Because converting 16777217
to float could have an observable side-effect if the inexact exception is unmasked.)
But with -O3 -fwrapv -fno-trapping-math
, it's 100% missed optimization not to compile this to an empty infinite loop. Without #pragma STDC FENV_ACCESS ON
, the state of the sticky flags that record masked FP exceptions is not an observable side-effect of the code. No int
->float
conversion can result in NaN, so x != x
can't be true.
These compilers are all optimizing for C++ implementations that use IEEE 754 single-precision (binary32) float
and 32-bit int
.
The bugfixed (int)(float)i != i
loop would have UB on C++ implementations with narrow 16-bit int
and/or wider float
, because you'd hit signed-integer overflow UB before reaching the first integer that wasn't exactly representable as a float
.
But UB under a different set of implementation-defined choices doesn't have any negative consequences when compiling for an implementation like gcc or clang with the x86-64 System V ABI.
BTW, you could statically calculate the result of this loop from FLT_RADIX
and FLT_MANT_DIG
, defined in <climits>
. Or at least you can in theory, if float
actually fits the model of an IEEE float rather than some other kind of real-number representation like a Posit / unum.
I'm not sure how much the ISO C++ standard nails down about float
behaviour and whether a format that wasn't based on fixed-width exponent and significand fields would be standards compliant.
In comments:
@geza I would be interested to hear the resulting number!
@nada: it's 16777216
Are you claiming you got this loop to print / return 16777216
?
Update: since that comment has been deleted, I think not. Probably the OP is just quoting the float
before the first integer that can't be exactly represented as a 32-bit float
. https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Precision_limits_on_integer_values i.e. what they were hoping to verify with this buggy code.
The bugfixed version would of course print 16777217
, the first integer that's not exactly representable, rather than the value before that.
(All the higher float values are exact integers, but they're multiples of 2, then 4, then 8, etc. for exponent values higher than the significand width. Many higher integer values can be represented, but 1 unit in the last place (of the significand) is greater than 1 so they're not contiguous integers. The largest finite float
is just below 2^128, which is too large for even int64_t
.)
If any compiler did exit the original loop and print that, it would be a compiler bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With