Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++'s Strict Aliasing Rule - Is the 'char' aliasing exemption a 2-way street?

Just a couple weeks ago, I learned that the C++ Standard had a strict aliasing rule. Basically, I had asked a question about shifting bits -- rather than shifting each byte one at a time, to maximize performance I wanted to load my processor's native register's with (32 or 64 bits, respectively) and perform the shift of 4/8 bytes all in a single instruction.

This is the code I wanted to avoid:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };

for (int i = 0; i < 3; ++i)
{
  buffer[i] <<= 4; 
  buffer[i] |= (buffer[i + 1] >> 4);
}
buffer[3] <<= 4;

And instead, I wanted to use something like:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };
unsigned int *p = (unsigned int*)buffer; // unsigned int is 32 bit on my platform
*p <<= 4;

Someone called out in a comment that my proposed solution violated the C++ Aliasing rules (because p was of type int* and buffer was of type char* and I was dereferencing p to perform the shift. (Please ignore possible issues of alignment and byte order -- I handle those outside of this snippet) I was quite surprised to learn about he Strict Aliasing rule since I regularly operate on data from buffers, casting it from one type to another and have never had any issue. Further investigation revealed that the compiler I use (MSVC) doesn't enforce strict aliasing rules and since I only develop on gcc/g++ in my spare time as a hobby, I likely just hadn't encountered the issue yet.

So then I asked a question about Strict Aliasing Rules and C++'s Placement new operator:

IsoCpp.org offers a FAQ regarding placement new and they provide the following code example:

#include <new>        // Must #include this to use "placement new"
#include "Fred.h"     // Declaration of class Fred
void someCode()
{
  char memory[sizeof(Fred)];     // Line #1
  void* place = memory;          // Line #2
  Fred* f = new(place) Fred();   // Line #3 (see "DANGER" below)
  // The pointers f and place will be equal
  // ...
}

The example is simple enough, but I'm asking myself, "What if someone calls a method on f -- e.g. f->talk()? At that point we would be dereferencing f, which points to the same memory location as memory (of type char*. I've read numerous places that there is an exemption for variables of type char* to alias any type, but I was under the impression that it wasn't a "two-way street" -- meaning, char* can alias (read/write) any type T, but type T can only be used to alias a char* if T itself is of char*. As I'm typing this, that doesn't make any sense to me and so I'm leaning towards the belief that the claim that my initial (bit shifting example) violated the strict aliasing rule is false.

Can someone please explain what is correct? I've been going nuts with trying to understand what is legal and what is not (despite having read numerous websites and SO posts on the topic)

Thank you

like image 297
digitale Avatar asked May 16 '16 17:05

digitale


People also ask

What is the strict aliasing rule and why do we care?

GCC compiler makes an assumption that pointers of different types will never point to the same memory location i.e., alias of each other. Strict aliasing rule helps the compiler to optimize the code.

How do you get around strict aliasing?

The answer typically is to type pun, often the methods used violate strict aliasing rules. Sometimes we want to circumvent the type system and interpret an object as a different type. This is called type punning, to reinterpret a segment of memory as another type.

What is C++ aliasing?

In C, C++, and some other programming languages, the term aliasing refers to a situation where two different expressions or symbols refer to the same object.

What is the problem of aliasing while using pointers?

Because any pointer could alias any other pointer in C, the compiler must assume that memory regions accessed through these pointers can overlap, which prevents many possible optimizations. C++ enables more optimizations, as pointer arguments will not be treated as possible aliases if they point to different types.


1 Answers

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.

like image 72
Javier Martín Avatar answered Oct 25 '22 04:10

Javier Martín