Can you propose at least 1 scenario where there is a substantial difference between
union {
T var_1;
U var_2;
}
and
var_2 = reinterpret_cast<U> (var_1)
?
The more i think about this, the more they look like the same thing to me, at least from a practical viewpoint.
One difference that I found is that while the union size is big as the biggest data type in terms of size, the reinterpret_cast as described in this post can lead to a truncation, so the plain old C-style union is even safer than a newer C++ casting.
Can you outline the differences between this 2 ?
Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.
From the standard point of view, reinterpret_cast
is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.
At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).
Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.
This is particularly important with high optimizations when reading from network:
double ntohdouble(const char *buffer) { // [1]
union {
int64_t i;
double f;
} data;
memcpy(&data.i, buffer, sizeof(int64_t));
data.i = ntohll(data.i);
return data.f;
}
double ntohdouble(const char *buffer) { // [2]
int64_t data;
double dbl;
memcpy(&data, buffer, sizeof(int64_t));
data = ntohll(data);
dbl = *reinterpret_cast<double*>(&data);
return dbl;
}
The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl
variable before evaluating ntohl, thus producing the wrong results.
(*) With the exception that you are always allowed to read from a [signed|unsigned] char*
regardless of that the real object (original pointer type) was.
(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.
There are some technical differences between a proper union
and a (let's assume) a proper and safe reinterpret_cast
. However, I can't think of any of these differences which cannot be overcome.
The real reason to prefer a union
over reinterpret_cast
in my opinion isn't a technical one. It's for documentation.
Supposing you are designing a bunch of classes to represent a wire protocol (which I guess is the most common reason to use type-punning in the first place), and that wire protocol consists of many messages, submessages and fields. If some of those fields are common, such as msg type, seq#, etc, using a union simplifies tying these elements together and helps to document exactly how the protocol appears on the wire.
Using reinterpret_cast
does the same thing, obviously, but in order to really know what's going on you have to examine the code that advances from one packet to the next. Using a union
you can just take a look at the header and get an idea what's going on.
In C++11, union is class type, you can an hold a member with non-trivial member functions. You can't simply cast from one member to another.
§ 9.5.3
[ Example: Consider the following union:
union U {
int i;
float f;
std::string s;
};
Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor. To use U, some or all of these member functions must be user-provided. — end example ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With