Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is reinterpret_cast type punning actually undefined behavior?

It appears to be widely-held that type punning via reinterpret_cast is somehow prohibited (properly: "undefined behavior", that is, "behavior for which this International Standard imposes no requirements", with an explicit note that implementations may define behavior) in C++. Am I incorrect in using the following reasoning to disagree, and if so, why?


[expr.reinterpret.cast]/11 states:

A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_­cast. The result refers to the same object as the source glvalue, but with the specified type. [ Note: That is, for lvalues, a reference cast reinterpret_­cast<T&>(x) has the same effect as the conversion *reinterpret_­cast<T*>(&x) with the built-in & and * operators (and similarly for reinterpret_­cast<T&&>(x)).  — end note ] No temporary is created, no copy is made, and constructors or conversion functions are not called.

with the footnote:

75) This is sometimes referred to as a type pun.

/11 implicitly, via example, carries the restrictions of /6 through /10, but perhaps the most common usage (punning objects) is addressed in [expr.reinterpret.cast]/7:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value.  — end note ]

Clearly the purpose cannot be conversion to/from pointers or references to void, as:

  1. the example in /7 clearly demonstrates that static_cast should suffice in the case of pointers, as do [expr.static.cast]/13 and [conv.ptr]/2; and
  2. [conversions to] references to void are prima facie invalid.

Further, [basic.lval]/8 states:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

(8.1) the dynamic type of the object,

(8.2) a cv-qualified version of the dynamic type of the object,

(8.3) a type similar to the dynamic type of the object,

(8.4) a type that is the signed or unsigned type corresponding to the dynamic type of the object,

(8.5) a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

(8.6) an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

(8.7) a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

(8.8) a char, unsigned char, or std​::​byte type.

And if we return to [expr.reinterpret.cast]/11 for a moment, we see "The result refers to the same object as the source glvalue, but with the specified type." This reads to me as an explicit statement that the result of reinterpret_cast<T&>(v) is an lvalue reference to an object of type T, to which access is clearly "through a glvalue of" "the dynamic type of the object". This sentence also addresses the argument that various paragraphs of [basic.life] apply via the spurious claim that the results of such conversions refer to a new object of type T, the lifetime of which has not yet begun, which just happens to reside at the same memory address as v.

It seems nonsensical to explicitly define such conversions only to disallow standard-defined use of the results, particularly in light of footnote 75 noting that such [reference] conversion is "sometimes referred to as a type pun."

Note that my references are to the final publicly-available draft for C++17 (N4659), but the language in question is little-changed from N3337 (C++11) through N4788 (C++20 WD) (tip link will likely refer to later drafts in time). In fact the footnote to [expr.reinterpret.cast]/11 is made even more explicit in the most recent draft:

This is sometimes referred to as a type pun when the result refers to the same object as the source glvalue.

like image 504
pandorafalters Avatar asked Jan 01 '19 13:01

pandorafalters


People also ask

Is type punning undefined behavior?

Most of the time, type punning won't cause any problems. It is considered undefined behavior by the C standard but will usually do the work you expect. That is unless you're trying to squeeze more performance out of your code through optimizations.

What is the purpose of Reinterpret_cast and how does it differ from a regular cast?

reinterpret_cast is a type of casting operator used in C++. It is used to convert a pointer of some data type into a pointer of another data type, even if the data types before and after conversion are different. It does not check if the pointer type and data pointed by the pointer is same or not.

What does Reinterpret_cast mean in C++?

The reinterpret_cast allows the pointer to be treated as an integral type. The result is then bit-shifted and XORed with itself to produce a unique index (unique to a high degree of probability). The index is then truncated by a standard C-style cast to the return type of the function.

Is Reinterpret_cast safe?

the result of a pointer-to-pointer reinterpret_cast operation can't safely be used for anything other than being cast back to the original pointer type.


3 Answers

I believe your misunderstanding lies here:

This reads to me as an explicit statement that the result of reinterpret_cast<T&>(v) is an lvalue reference to an object of type T, to which access is clearly "through a glvalue of" "the dynamic type of the object".

[basic.lval]/8 is a bit misleading because it talks about the dynamic type "of the object" when the dynamic type is actually a property of the glvalue [defns.dynamic.type] used to access the object rather than the object itself. Essentially, the dynamic type of the glvalue is the type of the object that is currently living in the place that the glvalue refers to (effectively, the type of the object that was constructed/initialized in that piece of memory) [intro.object]/6. For example:

float my_float = 42.0f;
std::uint32_t& ui = reinterpret_cast<std::uint32_t&>(my_float);

here, ui is a reference that refers to the object created by the definition of my_float. Accessing this object through the glvalue ui would invoke undefined behavior (per [basic.lval]/8.1), however, because the dynamic type of the glvalue is float while the type of the glvalue is std::uint32_t.

There are few valid uses of a reinterpret_cast like that, but use cases other than just casting to void* and back exist (for the latter, static_cast would be sufficient, as you noted yourself). [basic.lval]/8 effectively gives you a complete list of what they are. For example, it would be valid to examine (and even copy if the dynamic type of the object is trivially-copyable [basic.types]/9) the value of an object by casting the address of the object to char*, unsigned char*, or std::byte* (not signed char*, however). It would be valid to reinterpret_cast an object of signed type to access it as its corresponding unsigned type and vice versa. It would also be valid to cast a pointer/reference to a union to a pointer/reference to a member of that union and access that member through the resulting lvalue if that member is the active member of the union…

The main reason why type punning through casts like this is undefined in general is that making it defined behavior would prohibit some extremely vital compiler optimizations. If you'd allow any object of any type to simply be accessed through an lvalue of any other type, then the compiler would have to assume that any modification of an object through some lvalue can potentially affect the value of any object in the program unless it can prove otherwise. As a result, it would basically be impossible, for example, to keep stuff around in registers for any useful period of time because any modification of anything would immediately invalidate whatever you may have in registers at the moment. Yes, any good optimizer will perform aliasing analysis. But, while such methods certainly work and are powerful, they can, out of principle, only cover a subset of cases. Disproving or proving aliasing in general is basically impossible (equivalent to solving the halting problem I would think)…

like image 168
Michael Kenzel Avatar answered Nov 11 '22 09:11

Michael Kenzel


[basic.lval]/8 says when the behavior will surely be undefined, but this does not necessarily mean that if you do something from the list in [basic.lval]/8 the behavior will be defined.

[basic.lval]/8 hasn't been changed much since C++98 and it has inaccurate wording such as the use of an undefined term "dynamic type of the object". (C++ defines dynamic types for expressions).

Behavior definedness, in case you do something allowed by [basic.lval]/8, depends on other parts of the standard. Even if it might be agreed that the result of signed/unsigned reinterpretation could be derived from wording in [basic.types], I can't imagine how is it possible to predict the result of an access to an object containing references or virtual methods through char glvalue.

C++17's new pointer and glvalue casting rules made [basic.lval]/8 more useless, because now it is not formally possible to achieve the aims [basic.lval]/8 intended to guarantee (for example, to read the bytes in an object through char glvalue). As you pointed out, per [expr.reinterpret.cast]/7, after reinterpret_cast to a reference to T, the resulting glvalue still refers to the object the argument of reinterpret_cast referred to.

Per [conv.lval]/(3.4), the result of the lvalue-to-rvalue conversion is the value contained in the object to which the converted glvalue refers. For example, these rules mean that the result of the lvalue-to-rvalue conversion applied to reinterpret_cast<char&>(i), where i is an int variable, is the value stored in the i int object. The type of the prvalue is char ([conv.lval]/1) and if the value of i is not representable by char, according to [expr]/1 the behavior is undefined. Trying to read an int object through char glvalue will result in UB if the value of the object is not representable by char, even though this access is "allowed" by [basic.lval]/(8.8). This proves what have been said in the first paragraph.

like image 33
Language Lawyer Avatar answered Nov 11 '22 09:11

Language Lawyer


A reference built using reinterpret_cast (I include casting a pointer and then dereferencing) can be roundtripped to the original type, if the intermediate types had equal or less stringent alignment requirements.

Most other uses are undefined behavior, due to the strict aliasing rule. (no language citation needed because the question already quotes it)

Notable legal cases where the expression final type does not match the object's dynamic type include aliasing through narrow character types, and the common initial sequence rule for structures.

like image 24
Ben Voigt Avatar answered Nov 11 '22 11:11

Ben Voigt