Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does the C++ standard describe the casting of pointers to primitives?

In the excellent blog post What Every Programmer Should Know About Undefined Behavior, the section "Violating Type Rules" says:

It is undefined behavior to cast an int* to a float* and dereference it (accessing the "int" as if it were a "float"). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don't want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc).

I'd like to understand the rules in their full nuancedness. Where are they in the C++11 spec? Or failing that, the C spec (C90, C99, C11)?

In the C++11 spec linked from this Stack Overflow question, N3485, I'm looking in 5.2.10 "Reinterpret cast" but don't see language for an exception for char* or unions. So that's probably not the right place. So where is the right place?

like image 604
Martin C. Martin Avatar asked Feb 06 '13 13:02

Martin C. Martin


1 Answers

The rule you're looking for is in §3.10/10 (in C++11):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: — the dynamic type of the object,

— a cv-qualified version of the dynamic type of the object,

— a type similar (as defined in 4.4) to the dynamic type of the object,

— a type that is the signed or unsigned type corresponding to the dynamic type of the object, — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

— an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

— a char or unsignedchar type.

There are different types (or motivations) for undefined behavior.

In the case of casting an int* to float* and then dereferencing it, it is clear that the standard cannot define it, since what might happen will depend on the architecture, and the value of the int. On the other hand, the quoted paragraph is completely wrong—using memcpy to do the conversion is also undefined behavior, for largely the same reasons.

One of the motivations for undefined behavior is to allow implementations to define it, in a manner that makes sense for the target architecture, if such exists. This is such a case. A compiler which intentionally causes it to fail is defective. Of course, if we suppose 32 bit 2's complement int, and 32 bit IEEE float, we may expect certain values of the int to correspond to trapping NaN, which will cause the program to fail. This is part of the reason the behavior is undefined; to allow such things to happen. But if we are familiar with the low level details of the hardware, it should work as expected, provided the compiler can see the cast. If it doesn't, this is a QoI problem with the compiler, and such a compiler should be avoided for such types of work.

As hinted at above, this particular case, and in fact, in all cases which involve type punning (writing to one member of a union, and reading from another, for example), do pose a problem, to which the standard has yet to find adequate wording. The problem occurs because normally, the compiler is allowed to assume that pointers to different types (except character types) do not alias; that an int* can never point to the same object as a float*. And proving that two pointers cannot alias is important for optimization. A compiler that breaks code where the pointer cast or the union is clearly visible is just broken, even if the standard says it is undefined behavior. A compiler that breaks code where all it sees are two pointers to unrelated types is understandable, even in cases where the standard says the behavior is well defined.

Using memcpy avoids this problem by using two different objects, which don't alias. It still encounters undefined behavior because putting the bit pattern of an int into a float, then accessing the float, doesn't have any defined behavior. (Or vice-versa; I know of at least one machine where copying the bits of a float into an int may result in an illegal int value.)

like image 76
James Kanze Avatar answered Nov 15 '22 19:11

James Kanze