I have stumbled upon a reddit thread in which a user has found an interesting detail of the C++ standard. The thread has not spawned much constructive discussion, therefore I will retell my understanding of the problem here:
memcpy
in a standard-compliant wayreinterpret_cast<char*>(&foo)
, which is an allowed exception to the strict aliasing restrictions, in which reinterpreting as char
is allowed to access the "object representation" of an object.static_cast<cv T*>(static_cast<cv void*>(v))
, so reinterpret_cast
in this case is equivalent to static_cast'ing first to void *
and then to char *
.A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. [...] if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. [...] [emphasis mine]
Consider now the following union class:
union Foo{
char c;
int i;
};
// the OP has used union, but iiuc,
// it can also be a struct for the problem to arise.
OP has thus come to the conclusion that reinterpreting a Foo*
as char*
in this case yields a pointer pointing to the first char member of the union (or its object representation), rather than to the object representation of the union itself, i.e. it points only to the member. While this appears superficially to be the same, and corresponds to the same memory address, the standard seems to differentiate between the "value" of a pointer and its corresponding address, in that on the abstract C++ machine, a pointer belongs to a certain object only. Incrementing it beyond that object (compare with end() of an array) is undefined behavior.
OP thus argues that if the standard forces the char*
to be associated with the objects's first member instead of the object representation of the whole union object, dereferencing it after one incrementation is UB, which allows a compiler to optimize as if it were impossible for the resultant char*
to ever access the following bytes of the int member. This implies that it is not possible to legally access the complete object representation of a class object which is pointer-interconvertible with a char
member.
The same would, if I understand correctly apply if "union" was simply replaced with "struct", but I have taken this example from the original thread.
What do you think? Is this a standard defect? Is it a misinterpretation?
This video, linked in the comments (now chat) by @KonradRudolph is likely the answer to the problem.
At around the 40min mark, Timur Doumler, who is a member of the ISO C++ commitee, discusses the possibility of accessing byte representations. The summary is that any attempt of accessing byte representation except memcpy
is UB. The situation in the OP does not even arise without making use of UB because the very act of using a pointer to an object like an array, or doing any pointer arithmetic on it is UB, as these operations are only well-defined when dealing with actual array objects, as far as the abstract machine is concerned.
Also, while reinterpreting a pointer as a char*
does not on its own violate aliasing rules, there is technically no guarantee that the resulting char*
will point to the first byte of the object.
The only legal way of accessing byte representations is to memcpy
the object into a char array. This means that reimplementing memcpy
is impossible.
Timur Doumler additionally describes this as a wording defect that will hopefully be fixed in C++23 and presents a paper that proposes a fix to this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With