Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

union for uint32_t and uint8_t[4] undefined behavior? [duplicate]

In the comments of this answer it is said that it would be undefined behavior to split up an integer into their bytes using a union like follows. The code given at that place is similar though not identical to this, please give a note if have I changed undefined-behavior-relevant aspects of the code.

union addr {
 uint8_t addr8[4];
 uint32_t addr32;
};

Up to now I thought this would be a fine approach to do things like addr = {127, 0, 0, 1}; and get the corresponding uint32_t in return. (I acknowledge that this may yield different results depending on the endianness of my system. The question however remains.)

Is this undefined behavior? If so, why? (I don't know what means What's UB in C++ is to access inactive union members.)


C99

  • C99 is apparantly pretty close to C++03 in this point.

C++03

  • In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time. C++03, Section 9.5 (1), page 162

However

  • If a POD-union contains several POD-structs that share a common initial sequence [...] it is permitted to inspect the common initial sequence of any of POD-struct members ibid.
  • Two POD-struct [...] types are layout-compatible if they have the same number of nonstatic data members, and corresponding nonstatic data members (in order) have layout-compatible types C++03, Section 9.2 (14), page 157
  • If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types. C++03, Section 3.9 (11), page 53

Conclusion

  • as uint8_t[4] and uint32_t are not the same type (I guess, a strict aliasing thing) (plus both not being POD-structs/union) the above is indeed UB?

C++11

  • Note that aggregate type does not include union type because an object with union type can only contain one member at a time. C++11, Footnote 46, page 42
like image 398
moooeeeep Avatar asked Nov 28 '22 17:11

moooeeeep


1 Answers

I don't know what means What's UB in C++ is to access inactive union members.

Basically what it means is that the only member you can read from a union without invoking undefined behavior is the last written one. In other words, if you write to addr32, you can only read from addr32, not addr8 and vice versa.

An example is also available here.

Edit: Since there has been much discussion if this is UB or not, consider the following (fully valid) C++11 example;

union olle {
    std::string str;
    std::wstring wstr;
};

Here you can definitely see that activating str and reading wstr may be a problem. You could see this as an extreme example since you even have to activate the member by doing a placement new, but the spec actually covers this case with no mention that it's to be considered a special case in other ways regarding active members.

like image 93
Joachim Isaksson Avatar answered Dec 19 '22 04:12

Joachim Isaksson