Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reinterpreting a union to a different union

I have a standard-layout union that has a whole bunch of types in it:

union Big {     Hdr h;      A a;     B b;     C c;     D d;     E e;     F f; }; 

Each of the types A thru F is standard-layout and has as its first member an object of type Hdr. The Hdr identifies what the active member of the union is, so this is variant-like. Now, I'm in a situation where I know for certain (because I checked) that the active member is either a B or a C. Effectively, I've reduced the space to:

union Little {     Hdr h;      B b;     C c; }; 

Now, is the following well-defined or undefined behavior?

void given_big(Big const& big) {     switch(big.h.type) {     case B::type: // fallthrough     case C::type:         given_b_or_c(reinterpret_cast<Little const&>(big));         break;     // ... other cases here ...     } }  void given_b_or_c(Little const& little) {     if (little.h.type == B::type) {         use_a_b(little.b);     } else {         use_a_c(little.c);     } } 

The goal of Little is to effectively serve as documentation, that I've already checked that it's a B or C so in the future nobody adds code to check that it's an A or something.

Is the fact that I am reading the B subobject as a B enough to make this well-formed? Can the common initial sequence rule meaningfully be used here?

like image 567
Barry Avatar asked Feb 14 '18 14:02

Barry


Video Answer


2 Answers

To be able to take a pointer to A, and reinterpret it as a pointer to B, they must be pointer-interconvertible.

Pointer-interconvertible is about objects, not types of objects.

In C++, there are objects at places. If you have a Big at a particular spot with at least one member existing, there is also a Hdr at that same spot due to pointer interconvertability.

However there is no Little object at that spot. If there is no Little object there, it cannot be pointer-interconvertible with a Little object that isn't there.

They appear to be layout-compatible, assuming they are flat data (plain old data, trivially copyable, etc).

This means you can copy their byte representation and it works. In fact, optimizers seem to understand that a memcpy to a stack local buffer, a placement new (with trivial constructor), then a memcpy back is actually a noop.

template<class T> T* laundry_pod( void* data ) {   static_assert( std::is_pod<Data>{}, "POD only" ); // could be relaxed a bit   char buff[sizeof(T)];   std::memcpy( buff, data, sizeof(T) );   T* r = ::new( data ) T;   std::memcpy( data, buff, sizeof(T) );   return r; } 

the above function is a noop at runtime (in an optimized build), yet it converts T-layout-compatible data at data to an actual T.

So, if I am right and Big and Little are layout-compatible when Big is a subtype of the types in Little, you can do this:

Little* inplace_to_little( Big* big ) {   return laundry_pod<Little>(big); } Big* inplace_to_big( Little* big ) {   return laundry_pod<Big>(big); } 

or

void given_big(Big& big) { // cannot be const   switch(big.h.type) {   case B::type: // fallthrough   case C::type:     auto* little = inplace_to_little(&big); // replace Big object with Little inplace     given_b_or_c(*little);      inplace_to_big(little); // revive Big object.  Old references are valid, barring const data or inheritance     break;   // ... other cases here ...   } } 

if Big has non-flat data (like references or const data), the above breaks horribly.

Note that laundry_pod doesn't do any memory allocation; it uses placement new that constructs a T in the place where data points using the bytes at data. And while it looks like it is doing lots of stuff (copying memory around), it optimizes to a noop.


c++ has a concept of "an object exists". The existence of an object has almost nothing to do with what bits or bytes are written in the physical or abstract machine. There is no instruction on your binary that corresponds to "now an object exists".

But the language has this concept.

Objects that don't exist cannot be interacted with. If you do so, the C++ standard does not define the behavior of your program.

This permits the optimizer to make assumptions about what your code does and what it doesn't do and which branches cannot be reached and which can be reached. It lets the compiler make no-aliasing assumptions; modifying data through a pointer or reference to A cannot change data reached through a pointer or reference to B unless somehow both A and B exist in the same spot.

The compiler can prove that Big and Little objects cannot both exist in the same spot. So no modification of any data through a pointer or reference to Little could modify anything existing in a variable of type Big. And vice versa.

Imagine if given_b_or_c modifies a field. Well the compiler could inline given_big and given_b_or_c and use_a_b, notice that no instance of Big is modified (just an instance of Little), and prove that fields of data from Big it cached prior to calling your code could not be modified.

This saves it a load instruction, and the optimizer is quite happy. But now you have code that reads:

Big b = whatever; b.foo = 7; ((Little&)b).foo = 4; if (b.foo!=4) exit(-1); 

that is optimzied to

Big b = whatever; b.foo = 7; ((Little&)b).foo = 4; exit(-1); 

because it can prove that b.foo must be 7 it was set once and never modified. The access through Little could not modify the Big due to aliasing rules.

Now do this:

Big b = whatever; b.foo = 7; (*laundry_pod<Little>(&b)).foo = 4; Big& b2 = *laundry_pod<Big>(&b); if (b2.foo!=4) exit(-1); 

and it the assume that the big there was unchanged, because there is a memcpy and a ::new that could legally change the state of the data. No strict aliasing violation.

It can still follow the memcpy and eliminate it.

Live example of laundry_pod being optimized away. Note that if it wasn't optimized away, the code would have to have a conditional and a printf. But because it was, it was optimized into the empty program.

like image 165
Yakk - Adam Nevraumont Avatar answered Sep 23 '22 21:09

Yakk - Adam Nevraumont


I can find no wording in n4296 (draft C++14 standard) which would make this legal. What is more, I cannot even find any wording that given:

union Big2 {     Hdr h;      A a;     B b;     C c;     D d;     E e;     F f; }; 

we can reinterpret_cast a reference to Big into a reference to Big2 and then use the reference. (Note that Big and Big2 are layout-compatible.)

like image 38
Martin Bonner supports Monica Avatar answered Sep 24 '22 21:09

Martin Bonner supports Monica