Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a float guaranteed to be preserved when transported through a double in C/C++?

Assuming IEEE-754 conformance, is a float guaranteed to be preserved when transported through a double?

In other words, will the following assert always be satisfied?

int main() {     float f = some_random_float();     assert(f == (float)(double)f); } 

Assume that f could acquire any of the special values defined by IEEE, such as NaN and Infinity.

According to IEEE, is there a case where the assert will be satisfied, but the exact bit-level representation is not preserved after the transportation through double?

The code snippet is valid in both C and C++.

like image 211
Kristian Spangsege Avatar asked Feb 08 '13 13:02

Kristian Spangsege


People also ask

Can you store a float in a double?

Yes, a double can represent all values a float can. Here is why: Both numbers are represented as the sign, the exponent and the mantissa. The difference between float and double is, that there is more space for the exponent and the mantissa.

How is float stored in C?

Scalars of type float are stored using four bytes (32-bits). The format used follows the IEEE-754 standard. The mantissa represents the actual binary digits of the floating-point number. The power of two is represented by the exponent.

Why we use double instead of float?

float is mostly used in graphic libraries for high processing power due to its small range. double is mostly used for calculations in programming to eliminate errors when decimal values are being rounded off. Although float can still be used, it should only be in cases when we're dealing with small decimal values.

How are doubles stored in C?

The double in C is a data type that is used to store high-precision floating-point data or numbers (up to 15 to 17 digits). It is used to store large values of decimal numbers. Values that are stored are double the size of data that can be stored in the float data type. Thus it is named a double data type.


2 Answers

You don't even need to assume IEEE. C89 says in 3.1.2.5:

The set of values of the type float is a subset of the set of values of the type double

And every other C and C++ standard says equivalent things. As far as I know, NaNs and infinities are "values of the type float", albeit values with some special-case rules when used as operands.

The fact that the float -> double -> float conversion restores the original value of the float follows (in general) from the fact that numeric conversions all preserve the value if it's representable in the destination type.

Bit-level representations are a slightly different matter. Imagine that there's a value of float that has two distinct bitwise representations. Then nothing in the C standard prevents the float -> double -> float conversion from switching one to the other. In IEEE that won't happen for "actual values" unless there are padding bits, but I don't know whether IEEE rules out a single NaN having distinct bitwise representations. NaNs don't compare equal to themselves anyway, so there's also no standard way to tell whether two NaNs are "the same NaN" or "different NaNs" other than maybe converting them to strings. The issue may be moot.

One thing to watch out for is non-conforming modes of compilers, in which they keep super-precise values "under the covers", for example intermediate results left in floating-point registers and reused without rounding. I don't think that would cause your example code to fail, but as soon as you're doing floating-point == it's the kind of thing you start worrying about.

like image 142
Steve Jessop Avatar answered Sep 29 '22 21:09

Steve Jessop


From C99:

6.3.1.5 Real floating types
1 When a float is promoted to double or long double, or a double is promoted to long double, its value is unchanged.
2 When a double is demoted to float, a long double is demoted to double or float, or a value being represented in greater precision and range than required by its semantic type (see 6.3.1.8) is explicitly converted to its semantic type, if the value being converted can be represented exactly in the new type, it is unchanged...

I think, this guarantees you that a float->double->float conversion is going to preserve the original float value.

The standard also defines the macros INFINITY and NAN in 7.12 Mathematics <math.h>:

4 The macro INFINITY expands to a constant expression of type float representing positive or unsigned infinity, if available; else to a positive constant of type float that overflows at translation time.
5 The macro NAN is defined if and only if the implementation supports quiet NaNs for the float type. It expands to a constant expression of type float representing a quiet NaN.

So, there's provision for such special values and conversions may just work for them as well (including for the minus infinity and negative zero).

like image 21
Alexey Frunze Avatar answered Sep 29 '22 19:09

Alexey Frunze