Understanding of Guru of the Week #67: Double or Nothing

Question

Recently, I was reading the post: Double or Nothing from GOTW by Herb Sutter I am a little confused with the explanation of the following program:

 int main()
 {
     double x = 1e8;
     while( x > 0 )
     {
        --x;
     }
 }

Assume that this code runs 1 second in some machine. I agree with the point that code like this is silly.

However, per the explanation about the issue if we change x from float to double, then on some compilers, it will keep the computer running to forever. The explanation is based on the following quote from the standard.

Quoting from section 3.9.1/8 of the C++ standard:

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

The question for the code is:

How long would you expect it to take if you change "double" to "float"? Why?

Here is the explanation given:

It will probably take either about 1 second (on a particular implementation floats may be somewhat faster, as fast, or somewhat slower than doubles), or forever, depending whether or not float can exactly represent all integer values from 0 to 1e8 inclusive.

The above quote from the standard means that there may be values that can be represented by a double but that cannot be represented by a float. In particular, on some popular platforms and compilers, double can exactly represent all integer values in [0,1e8] but float cannot.

What if float can't exactly represent all integer values from 0 to 1e8? Then the modified program will start counting down, but will eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)... and

My question is:

If float is not even able to represent 1e8, then we should have overflow already when we initialize float x = 1e8; then how come we will make the computer running forever?

I tried a simple example here (though not double but int)

#include <iostream>

int main()
{
   int a = 4444444444444444444;
   std::cout << "a " << a << std::endl;
   return 0;
}
It outputs: a -1357789412

This means that if the compiler is not able to represent the given number with int type, it will result in overflow.

So did I misread? What point that I missed? Is changing x from double to float undefined behavior?

Thank you!

Daniel Fischer · Accepted Answer

The key word is "exactly".

float can represent 1e8, even exactly, unless you have a freak float type. But that doesn't mean it can represent all smaller values exactly, for example, usually 2^25+1 = 33554433, which needs 26 bits of precision, cannot be exactly represented in float (usually, that has 23+1 bits of precision), nor can 2^25-1 = 33554431, which needs 25 bits of precision.

Both of these numbers are then represented as 2^25 = 33554432, and then

33554432.0f - 1 == 33554432.0f

will loop. (You will hit a loop earlier, but that one has a nice decimal representation ;)

In integer arithmetic, you have x - 1 != x for all x, but not in floating point arithmetic.

Note that the loop might also finish even if float has only the usual 23+1 bits of precision, since the standard allows floating point computations to be carried out at a greater precision than the type has, and if the computation is performed at sufficiently greater precision (e.g. the usual double with 52+1 bits), every subtraction will change x.

Understanding of Guru of the Week #67: Double or Nothing

Tags:

c++

floating-point

precision

gotw

taocp

1 Answers

Daniel Fischer

Recent Activity

Donate For Us

Understanding of Guru of the Week #67: Double or Nothing

Tags:

c++

floating-point

precision

gotw

taocp

1 Answers

Daniel Fischer

Related questions

Recent Activity

Donate For Us