Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between a proper defined union and a reinterpret_cast?

Can you propose at least 1 scenario where there is a substantial difference between

union {
T var_1;
U var_2;
}

and

var_2 = reinterpret_cast<U> (var_1)

?

The more i think about this, the more they look like the same thing to me, at least from a practical viewpoint.

One difference that I found is that while the union size is big as the biggest data type in terms of size, the reinterpret_cast as described in this post can lead to a truncation, so the plain old C-style union is even safer than a newer C++ casting.

Can you outline the differences between this 2 ?

like image 1000
user2485710 Avatar asked Jul 29 '13 12:07

user2485710


3 Answers

Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.

From the standard point of view, reinterpret_cast is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.

At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).

Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.

This is particularly important with high optimizations when reading from network:

double ntohdouble(const char *buffer) {          // [1]
   union {
      int64_t   i;
      double    f;
   } data;
   memcpy(&data.i, buffer, sizeof(int64_t));
   data.i = ntohll(data.i);
   return data.f;
}
double ntohdouble(const char *buffer) {          // [2]
   int64_t data;
   double  dbl;
   memcpy(&data, buffer, sizeof(int64_t));
   data = ntohll(data);
   dbl = *reinterpret_cast<double*>(&data);
   return dbl;
}

The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl variable before evaluating ntohl, thus producing the wrong results.


(*) With the exception that you are always allowed to read from a [signed|unsigned] char* regardless of that the real object (original pointer type) was.

(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.

like image 104
David Rodríguez - dribeas Avatar answered Oct 03 '22 12:10

David Rodríguez - dribeas


There are some technical differences between a proper union and a (let's assume) a proper and safe reinterpret_cast. However, I can't think of any of these differences which cannot be overcome.

The real reason to prefer a union over reinterpret_cast in my opinion isn't a technical one. It's for documentation.

Supposing you are designing a bunch of classes to represent a wire protocol (which I guess is the most common reason to use type-punning in the first place), and that wire protocol consists of many messages, submessages and fields. If some of those fields are common, such as msg type, seq#, etc, using a union simplifies tying these elements together and helps to document exactly how the protocol appears on the wire.

Using reinterpret_cast does the same thing, obviously, but in order to really know what's going on you have to examine the code that advances from one packet to the next. Using a union you can just take a look at the header and get an idea what's going on.

like image 37
John Dibling Avatar answered Oct 03 '22 11:10

John Dibling


In C++11, union is class type, you can an hold a member with non-trivial member functions. You can't simply cast from one member to another.

§ 9.5.3

[ Example: Consider the following union:

union U {
int i;
float f;
std::string s;
};

Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor. To use U, some or all of these member functions must be user-provided. — end example ]

like image 24
billz Avatar answered Oct 03 '22 12:10

billz