Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Note: This question has been renamed and reduced to make it more focused and readable. Most of the comments refer to the old text.


According to the standard, objects of different type may not share the same memory location. So this would not be legal:

std::array<short, 4> shorts; int* i = reinterpret_cast<int*>(shorts.data()); // Not OK 

The standard, however, allows an exception to this rule: any object may be accessed through a pointer to char or unsigned char:

int i = 0; char * c = reinterpret_cast<char*>(&i); // OK 

However, it is not clear to me whether this is also allowed the other way around. For example:

char * c = read_socket(...); unsigned * u = reinterpret_cast<unsigned*>(c); // huh? 
like image 649
StackedCrooked Avatar asked Sep 27 '12 00:09

StackedCrooked


People also ask

What is the strict aliasing rule and why do we care?

"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"

What are aliases in c++?

You can use an alias declaration to declare a name to use as a synonym for a previously declared type. (This mechanism is also referred to informally as a type alias). You can also use this mechanism to create an alias template, which can be useful for custom allocators.


2 Answers

Some of your code is questionable due to the pointer conversions involved. Keep in mind that in those instances reinterpret_cast<T*>(e) has the semantics of static_cast<T*>(static_cast<void*>(e)) because the types that are involved are standard-layout. (I would in fact recommend that you always use static_cast via cv void* when dealing with storage.)

A close reading of the Standard suggests that during a pointer conversion to or from T* it is assumed that there really is an actual object T* involved -- which is hard to fulfill in some of your snippet, even when 'cheating' thanks to the triviality of types involved (more on this later). That would be besides the point however because...

Aliasing is not about pointer conversions. This is the C++11 text that outlines the rules that are commonly referred to as 'strict aliasing' rules, from 3.10 Lvalues and rvalues [basic.lval]:

10 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

(This is paragraph 15 of the same clause and subclause in C++03, with some minor changes in the text with e.g. 'lvalue' being used instead of 'glvalue' since the latter is a C++11 notion.)

In the light of those rules, let's assume that an implementation provides us with magic_cast<T*>(p) which 'somehow' converts a pointer to another pointer type. Normally this would be reinterpret_cast, which yields unspecified results in some cases, but as I've explained before this is not so for pointers to standard-layout types. Then it's plainly true that all of your snippets are correct (substituting reinterpret_cast with magic_cast), because no glvalues are involved whatsoever with the results of magic_cast.

Here is a snippet that appears to incorrectly use magic_cast, but which I will argue is correct:

// assume constexpr max constexpr auto alignment = max(alignof(int), alignof(short)); alignas(alignment) char c[sizeof(int)]; // I'm assuming here that the OP really meant to use &c and not c // this is, however, inconsequential auto p = magic_cast<int*>(&c); *p = 42; *magic_cast<short*>(p) = 42; 

To justify my reasoning, assume this superficially different snippet:

// alignment same as before alignas(alignment) char c[sizeof(int)];  auto p = magic_cast<int*>(&c); // end lifetime of c c.~decltype(c)(); // reuse storage to construct new int object new (&c) int;  *p = 42;  auto q = magic_cast<short*>(p); // end lifetime of int object p->~decltype(0)(); // reuse storage again new (p) short;  *q = 42; 

This snippet is carefully constructed. In particular, in new (&c) int; I'm allowed to use &c even though c was destroyed due to the rules laid out in paragraph 5 of 3.8 Object lifetime [basic.life]. Paragraph 6 of same gives very similar rules to references to storage, and paragraph 7 explains what happens to variables, pointers and references that used to refer to an object once its storage is reused -- I will refer collectively to those as 3.8/5-7.

In this instance &c is (implicitly) converted to void*, which is one of the correct use of a pointer to storage that has not been yet reused. Similarly p is obtained from &c before the new int is constructed. Its definition could perhaps be moved to after the destruction of c, depending on how deep the implementation magic is, but certainly not after the int construction: paragraph 7 would apply and this is not one of the allowed situations. The construction of the short object also relies on p becoming a pointer to storage.

Now, because int and short are trivial types, I don't have to use the explicit calls to destructors. I don't need the explicit calls to the constructors, either (that is to say, the calls to the usual, Standard placement new declared in <new>). From 3.8 Object lifetime [basic.life]:

1 [...] The lifetime of an object of type T begins when:

  • storage with the proper alignment and size for type T is obtained, and
  • if the object has non-trivial initialization, its initialization is complete.

The lifetime of an object of type T ends when:

  • if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
  • the storage which the object occupies is reused or released.

This means that I can rewrite the code such that, after folding the intermediate variable q, I end up with the original snippet.

Do note that p cannot be folded away. That is to say, the following is defintively incorrect:

alignas(alignment) char c[sizeof(int)]; *magic_cast<int*>(&c) = 42; *magic_cast<short*>(&c) = 42; 

If we assume that an int object is (trivially) constructed with the second line, then that must mean &c becomes a pointer to storage that has been reused. Thus the third line is incorrect -- although due to 3.8/5-7 and not due to aliasing rules strictly speaking.

If we don't assume that, then the second line is a violation of aliasing rules: we're reading what is actually a char c[sizeof(int)] object through a glvalue of type int, which is not one of the allowed exception. By comparison, *magic_cast<unsigned char>(&c) = 42; would be fine (we would assume a short object is trivially constructed on the third line).

Just like Alf, I would also recommend that you explicitly make use of the Standard placement new when using storage. Skipping destruction for trivial types is fine, but when encountering *some_magic_pointer = foo; you're very much likely facing either a violation of 3.8/5-7 (no matter how magically that pointer was obtained) or of the aliasing rules. This means storing the result of the new expression, too, since you most likely can't reuse the magic pointer once your object is constructed -- due to 3.8/5-7 again.

Reading the bytes of an object (this means using char or unsigned char) is fine however, and you don't even to use reinterpret_cast or anything magic at all. static_cast via cv void* is arguably fine for the job (although I do feel like the Standard could use some better wording there).

like image 169
Luc Danton Avatar answered Sep 23 '22 10:09

Luc Danton


This too:

// valid: char -> type alignas(int) char c[sizeof(int)]; int * i = reinterpret_cast<int*>(c); 

That is not correct. The aliasing rules state under which circumstances it is legal/illegal to access an object through an lvalue of a different type. There is an specific rule that says that you can access any object through a pointer of type char or unsigned char, so the first case is correct. That is, A => B does not necessarily mean B => A. You can access an int through a pointer to char, but you cannot access a char through a pointer to int.


For the benefit of Alf:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.
like image 36
David Rodríguez - dribeas Avatar answered Sep 21 '22 10:09

David Rodríguez - dribeas