Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ unions vs. reinterpret_cast

It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast of data is undefined behavior.

Consider the code below. The goal is to take the integer value of 0xffff and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)

#include <iostream>
using namespace std;

union unionType {
    int myInt;
    float myFloat;
};

int main() {

    int i = 0xffff;

    unionType u;
    u.myInt = i;

    cout << "size of int    " << sizeof(int) << endl;
    cout << "size of float  " << sizeof(float) << endl;

    cout << "myInt          " << u.myInt << endl;
    cout << "myFloat        " << u.myFloat << endl;

    float theFloat = *reinterpret_cast<float*>(&i);
    cout << "theFloat       " << theFloat << endl;

    return 0;
}

The output of this code, using both GCC and clang compilers is expected.

size of int    4
size of float  4
myInt          65535
myFloat        9.18341e-41
theFloat       9.18341e-41

My question is, does the standard actually preclude the value of myFloat from being deterministic? Is the use of a reinterpret_cast better in any way to perform this type of conversion?

The standard states the following in §9.5.1:

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.

The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast, but the earlier statement about active data members seems to preclude this guarantee.

So which construct is more correct?

Edit: Using Intel's icpc compiler, the above code produces even more interesting results:

$ icpc union.cpp
$ ./a.out
size of int    4
size of float  4
myInt          65535
myFloat        0
theFloat       0
like image 447
kgraney Avatar asked May 19 '13 16:05

kgraney


People also ask

When should I use reinterpret_cast?

Purpose for using reinterpret_cast It is used when we want to work with bits. If we use this type of cast then it becomes a non-portable product. So, it is suggested not to use this concept unless required. It is only used to typecast any pointer to its original type.

What does reinterpret_cast mean in C++?

The reinterpret_cast allows the pointer to be treated as an integral type. The result is then bit-shifted and XORed with itself to produce a unique index (unique to a high degree of probability). The index is then truncated by a standard C-style cast to the return type of the function.

Is reinterpret_cast portable?

Anyway, the consequence of this is, that reinterpret_cast<> is portable as long as you do not rely on the byte order in any way. Your example code does not rely on byte order, it treats all bytes the same (setting them to zero), so that code is portable.

Is reinterpret_cast safe?

the result of a pointer-to-pointer reinterpret_cast operation can't safely be used for anything other than being cast back to the original pointer type.


3 Answers

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.

Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.

This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

like image 192
Joseph Mansfield Avatar answered Oct 01 '22 02:10

Joseph Mansfield


It's not undefined behavior. It's implementation defined behavior. The first does mean that bad things can happen. The other means that what will happen has to be defined by the implementation.

The reinterpret_cast violates the strict aliasing rule. So I do not think it will work reliably. The union trick is what people call type-punning and is usually allowed by compilers. The gcc folks document the behavior of the compiler: http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation

I think this should work with icpc as well (but they do not appear to document how they implemented that). But when I looked the assembly, it looks like icc tries to cheat with float and use higher precision floating point stuff. Passing -fp-model source to the compiler fixed that. With that option, I get the same results as with gcc. I do not think you want to use this flag in general, this is just a test to verify my theory.

So for icpc, I think if you switch your code from int/float to long/double, type-punning will work on icpc as well.

like image 35
Guillaume Avatar answered Oct 01 '22 04:10

Guillaume


Undefined behavior does not mean bad things must happen. It means only that the language definition doesn't tell you what happens. This kind of type pun has been part of C and C++ programming since time immemorial (i.e., since 1969); it would take a particularly perverse implementor to write a compiler where this didn't work.

like image 37
Pete Becker Avatar answered Oct 01 '22 04:10

Pete Becker