It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast
of data is undefined behavior.
Consider the code below. The goal is to take the integer value of 0xffff
and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)
#include <iostream>
using namespace std;
union unionType {
int myInt;
float myFloat;
};
int main() {
int i = 0xffff;
unionType u;
u.myInt = i;
cout << "size of int " << sizeof(int) << endl;
cout << "size of float " << sizeof(float) << endl;
cout << "myInt " << u.myInt << endl;
cout << "myFloat " << u.myFloat << endl;
float theFloat = *reinterpret_cast<float*>(&i);
cout << "theFloat " << theFloat << endl;
return 0;
}
The output of this code, using both GCC and clang compilers is expected.
size of int 4
size of float 4
myInt 65535
myFloat 9.18341e-41
theFloat 9.18341e-41
My question is, does the standard actually preclude the value of myFloat
from being deterministic? Is the use of a reinterpret_cast
better in any way to perform this type of conversion?
The standard states the following in §9.5.1:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast
, but the earlier statement about active data members seems to preclude this guarantee.
So which construct is more correct?
Edit:
Using Intel's icpc
compiler, the above code produces even more interesting results:
$ icpc union.cpp
$ ./a.out
size of int 4
size of float 4
myInt 65535
myFloat 0
theFloat 0
Purpose for using reinterpret_cast It is used when we want to work with bits. If we use this type of cast then it becomes a non-portable product. So, it is suggested not to use this concept unless required. It is only used to typecast any pointer to its original type.
The reinterpret_cast allows the pointer to be treated as an integral type. The result is then bit-shifted and XORed with itself to produce a unique index (unique to a high degree of probability). The index is then truncated by a standard C-style cast to the return type of the function.
Anyway, the consequence of this is, that reinterpret_cast<> is portable as long as you do not rely on the byte order in any way. Your example code does not rely on byte order, it treats all bytes the same (setting them to zero), so that code is portable.
the result of a pointer-to-pointer reinterpret_cast operation can't safely be used for anything other than being cast back to the original pointer type.
The reason it's undefined is because there's no guarantee what exactly the value representations of int
and float
are. The C++ standard doesn't say that a float
is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int
object with value 0xffff
as a float
? It doesn't say anything other than the fact it is undefined.
Practically, however, this is the purpose of reinterpret_cast
- to tell the compiler to ignore everything it knows about the types of objects and trust you that this int
is actually a float
. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.
This is true for both the union
and reinterpret_cast
approaches. I suggest that reinterpret_cast
is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.
It's not undefined behavior. It's implementation defined behavior. The first does mean that bad things can happen. The other means that what will happen has to be defined by the implementation.
The reinterpret_cast violates the strict aliasing rule. So I do not think it will work reliably. The union trick is what people call type-punning
and is usually allowed by compilers. The gcc folks document the behavior of the compiler: http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation
I think this should work with icpc as well (but they do not appear to document how they implemented that). But when I looked the assembly, it looks like icc tries to cheat with float and use higher precision floating point stuff. Passing -fp-model source
to the compiler fixed that. With that option, I get the same results as with gcc.
I do not think you want to use this flag in general, this is just a test to verify my theory.
So for icpc, I think if you switch your code from int/float to long/double, type-punning will work on icpc as well.
Undefined behavior does not mean bad things must happen. It means only that the language definition doesn't tell you what happens. This kind of type pun has been part of C and C++ programming since time immemorial (i.e., since 1969); it would take a particularly perverse implementor to write a compiler where this didn't work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With