Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I understand C/C++ strict-aliasing correctly?

I've read this article about C/C++ strict aliasing. I think the same applies to C++.

As I understand, strict aliasing is used to rearrange the code for performance optimization. That's why two pointers of different (and unrelated in C++ case) types cannot refer to the same memory location.

Does this mean that problems can occur only if memory is modified? Apart of possible problems with memory alignment.

For example, handling network protocol, or de-serialization. I have a byte array, dynamically allocated and packet struct is properly aligned. Can I reinterpret_cast it to my packet struct?

char const* buf = ...; // dynamically allocated
unsigned int i = *reinterpret_cast<unsigned int*>(buf + shift); // [shift] satisfies alignment requirements
like image 506
Andriy Tylychko Avatar asked Sep 06 '11 14:09

Andriy Tylychko


People also ask

Does C have strict aliasing?

One of the lesser-known secrets of the C programming language is the so-called “strict aliasing rule”. This is a shame, because failing to adhere to it takes you (along with your code) straight into the realm of undefined behavior.

Why is aliasing strict?

This is done because they referred to the same memory location. Strict Aliasing: GCC compiler makes an assumption that pointers of different types will never point to the same memory location i.e., alias of each other. Strict aliasing rule helps the compiler to optimize the code.

What is pointer aliasing in C?

Pointer aliasing is a hidden kind of data dependency that can occur in C, C++, or any other language that uses pointers for array addresses in arithmetic operations. Array data identified by pointers in C can overlap, because the C language puts very few restrictions on pointers.

What is aliasing in programming?

An alias occurs when different variables point directly or indirectly to a single area of storage. Aliasing refers to assumptions made during optimization about which variables can point to or occupy the same storage area.


1 Answers

The problem here is not strict aliasing so much as structure representation requirements.

First, it is safe to alias between char, signed char, or unsigned char and any one other type (in your case, unsigned int. This allows you to write your own memory-copy loops, as long as they're defined using a char type. This is authorized by the following language in C99 (§6.5):

 6. The effective type of an object for an access to its stored value is the declared type of the object, if any. [Footnote: Allocated objects have no declared type] [...] If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

 7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [Footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]

  • a type compatible with the effective type of the object,
  • [...]
  • a character type.

Similar language can be found in the C++0x draft N3242 §3.11/10, although it is not as clear when the 'dynamic type' of an object is assigned (I'd appreciate any further references on what the dynamic type is of a char array, to which a POD object has been copied as a char array with proper alignment).

As such, aliasing is not a problem here. However, a strict reading of the standard indicates that a C++ implementation has a great deal of freedom in choosing a representation of an unsigned int.

As one random example, unsigned ints might be a 24-bit integer, represented in four bytes, with 8 padding bits interspersed; if any of these padding bits does not match a certain (constant) pattern, it is viewed as a trap representation, and dereferencing the pointer will result in a crash. Is this a likely implementation? Perhaps not. But there have been, historically, systems with parity bits and other oddness, and so directly reading from the network into an unsigned int, by a strict reading of the standard, is not kosher.

Now, the problem of padding bits is mostly a theoretical issue on most systems today, but it's worth noting. If you plan to stick to PC hardware, you don't really need to worry about it (but don't forget your ntohls - endianness is still a problem!)

Structures make it even worse, of course - alignment representations depend on your platform. I have worked on an embedded platform in which all types have an alignment of 1 - no padding is ever inserted into structures. This can result in inconsistencies when using the same structure definitions on multiple platforms. You can either manually work out the byte offsets for data structure members and reference them directly, or use a compiler-specific alignment directive to control padding.

So you must be careful when directly casting from a network buffer to native types or structures. But the aliasing itself is not a problem in this case.

like image 78
bdonlan Avatar answered Sep 23 '22 12:09

bdonlan