There is a example in http://www.gotw.ca/gotw/067.htm
int main()
{
double x = 1e8;
//float x = 1e8;
while( x > 0 )
{
--x;
}
}
When you change the double to float, it's a infinite loop in VS2008. According to the Gotw explanation:
What if float can't exactly represent all integer values from 0 to 1e8? Then the modified program will start counting down, but will eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)... and then the loop will stay stuck on that value until the machine on which the program is running runs out of power.
From what I understand, the IEEE754 float is a single precision(32 bits) and the range of float should be +/- 3.4e +/- 38 and it should have a 7 digits significant.
But I still don't understand how exactly this happens: "eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)." Can someone try to explan this bit ?
A bit of extra info : When I use double x = 1e8, it finished in about 1 sec, when I change it to
float x = 1e8, it runs much longer(still running after 5 min), also if I change it to float x = 1e7;
, it finished in about 1 second.
My testing environment is VS2008.
BTW I'm NOT asking the basic IEEE 754 format explanation as I already understand that.
Thanks
The operation x--
is (in this case) equivalent to x = x - 1
. That means the original value of x
is taken, 1
is subtracted (using infinite precision, as mandated by IEEE 754-1985), and then the result is rounded to the next value of the float
value space.
The rounded result for the numbers 1.0e8f + i
is given for i in [-10;10]
below:
-10: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-9: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-8: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-7: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-6: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-5: 9.9999992E7 (binary +|10011001|01111101011110000011111)
-4: 1.0E8 (binary +|10011001|01111101011110000100000)
-3: 1.0E8 (binary +|10011001|01111101011110000100000)
-2: 1.0E8 (binary +|10011001|01111101011110000100000)
-1: 1.0E8 (binary +|10011001|01111101011110000100000)
0: 1.0E8 (binary +|10011001|01111101011110000100000)
1: 1.0E8 (binary +|10011001|01111101011110000100000)
2: 1.0E8 (binary +|10011001|01111101011110000100000)
3: 1.0E8 (binary +|10011001|01111101011110000100000)
4: 1.0E8 (binary +|10011001|01111101011110000100000)
5: 1.00000008E8 (binary +|10011001|01111101011110000100001)
6: 1.00000008E8 (binary +|10011001|01111101011110000100001)
7: 1.00000008E8 (binary +|10011001|01111101011110000100001)
8: 1.00000008E8 (binary +|10011001|01111101011110000100001)
9: 1.00000008E8 (binary +|10011001|01111101011110000100001)
10: 1.00000008E8 (binary +|10011001|01111101011110000100001)
So you can see that 1.0e8f
and 1.0e8f + 4
and some other numbers have the same representation. Since you already know the details of the IEEE 754-1985 floating point formats, you also know that the remaining digits must have been rounded away.
Well, for the sake of argument, lets assume we have a processor which represents a floating point number with 7 significant decimal digits, and an mantissa with, say, 2 decimal digits. So now the number 1e8 would be stored as
1.000 000 e 08
(where the "." and "e" need not be actually stored.)
So now you want to compute "1e8 - 1". 1 is represented as
1.000 000 e 00
Now, in order to do the subtraction we first do a subtraction with infinite precision, then normalize so that the first digit before the "." is between 1 and 9, and finally round to the nearest representable value (with break on even, say). The infinite precision result of "1e8 - 1" is
0.99 999 999 e 08
or normalized
9.9 999 999 e 07
As can be seen, the infinite precision result needs one more digit in the significand than what our architecture actually provides; hence we need to round (and re-normalize) the infinitely precise result to 7 significant digits, resulting in
1.000 000 e 08
Hence you end up with "1e8 - 1 == 1e8" and your loop never terminates.
Now, in reality you're using IEEE 754 binary floats, which are a bit different, but the principle is roughly the same.
What is the result of n - 1 if n - 1 and n have both identical representation due to the approximate nature of floating point numbers?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With