A example in Gotw 67

Question

There is a example in http://www.gotw.ca/gotw/067.htm

int main()
{
  double x = 1e8;
  //float x = 1e8;
  while( x > 0 )
  {
    --x;
  }
}

When you change the double to float, it's a infinite loop in VS2008. According to the Gotw explanation:

What if float can't exactly represent all integer values from 0 to 1e8? Then the modified program will start counting down, but will eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)... and then the loop will stay stuck on that value until the machine on which the program is running runs out of power.

From what I understand, the IEEE754 float is a single precision(32 bits) and the range of float should be +/- 3.4e +/- 38 and it should have a 7 digits significant.

But I still don't understand how exactly this happens: "eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)." Can someone try to explan this bit ?

A bit of extra info : When I use double x = 1e8, it finished in about 1 sec, when I change it to float x = 1e8, it runs much longer(still running after 5 min), also if I change it to float x = 1e7;, it finished in about 1 second.

My testing environment is VS2008.

BTW I'm NOT asking the basic IEEE 754 format explanation as I already understand that.

Thanks

Roland Illig · Accepted Answer

The operation x-- is (in this case) equivalent to x = x - 1. That means the original value of x is taken, 1 is subtracted (using infinite precision, as mandated by IEEE 754-1985), and then the result is rounded to the next value of the float value space.

The rounded result for the numbers 1.0e8f + i is given for i in [-10;10] below:

 -10: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -9: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -8: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -7: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -6: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -5: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -4: 1.0E8           (binary +|10011001|01111101011110000100000)
  -3: 1.0E8           (binary +|10011001|01111101011110000100000)
  -2: 1.0E8           (binary +|10011001|01111101011110000100000)
  -1: 1.0E8           (binary +|10011001|01111101011110000100000)
   0: 1.0E8           (binary +|10011001|01111101011110000100000)
   1: 1.0E8           (binary +|10011001|01111101011110000100000)
   2: 1.0E8           (binary +|10011001|01111101011110000100000)
   3: 1.0E8           (binary +|10011001|01111101011110000100000)
   4: 1.0E8           (binary +|10011001|01111101011110000100000)
   5: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   6: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   7: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   8: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   9: 1.00000008E8    (binary +|10011001|01111101011110000100001)
  10: 1.00000008E8    (binary +|10011001|01111101011110000100001)

So you can see that 1.0e8f and 1.0e8f + 4 and some other numbers have the same representation. Since you already know the details of the IEEE 754-1985 floating point formats, you also know that the remaining digits must have been rounded away.

janneb · Answer

Well, for the sake of argument, lets assume we have a processor which represents a floating point number with 7 significant decimal digits, and an mantissa with, say, 2 decimal digits. So now the number 1e8 would be stored as

1.000 000 e 08

(where the "." and "e" need not be actually stored.)

So now you want to compute "1e8 - 1". 1 is represented as

1.000 000 e 00

Now, in order to do the subtraction we first do a subtraction with infinite precision, then normalize so that the first digit before the "." is between 1 and 9, and finally round to the nearest representable value (with break on even, say). The infinite precision result of "1e8 - 1" is

0.99 999 999 e 08

or normalized

9.9 999 999 e 07

As can be seen, the infinite precision result needs one more digit in the significand than what our architecture actually provides; hence we need to round (and re-normalize) the infinitely precise result to 7 significant digits, resulting in

1.000 000 e 08

Hence you end up with "1e8 - 1 == 1e8" and your loop never terminates.

Now, in reality you're using IEEE 754 binary floats, which are a bit different, but the principle is roughly the same.

visitor · Answer

What is the result of n - 1 if n - 1 and n have both identical representation due to the approximate nature of floating point numbers?

A example in Gotw 67

Tags:

c++

floating-point

precision

gotw

RoundPi

3 Answers

Roland Illig

janneb

visitor

Recent Activity

Donate For Us

A example in Gotw 67

Tags:

c++

floating-point

precision

gotw

RoundPi

3 Answers

Roland Illig

janneb

visitor

Related questions

Recent Activity

Donate For Us