I've read this article about C/C++ strict aliasing. I think the same applies to C++. As I understand, strict aliasing is used to rearrange the code for performance optimization. That's why two pointers of different (and unrelated in C++ case) types cannot refer to the same memory location. Does this mean that problems can occur only if memory is modified? Apart of possible problems with memory alignment. For example, handling network protocol, or de-serialization. I have a byte array, dynamically allocated and packet struct is properly aligned. Can I <code>reinterpret_cast</code> it to my packet struct? <pre class="prettyprint"><code>char const* buf = ...; // dynamically allocated unsigned int i = *reinterpret_cast<unsigned int*>(buf + shift); // [shift] satisfies alignment requirements </code></pre>

The problem here is not strict aliasing so much as structure representation requirements. First, it is safe to alias between <code>char</code>, <code>signed char</code>, or <code>unsigned char</code> and any one other type (in your case, <code>unsigned int</code>. This allows you to write your own memory-copy loops, as long as they're defined using a <code>char</code> type. This is authorized by the following language in C99 (§6.5): <blockquote> 6. The effective type of an object for an access to its stored value is the declared type of the object, if any. [Footnote: Allocated objects have no declared type] [...] If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access. 7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [Footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.] <ul> <li>a type compatible with the effective type of the object,</li> <li>[...]</li> <li>a character type.</li> </ul> </blockquote> Similar language can be found in the C++0x draft N3242 §3.11/10, although it is not as clear when the 'dynamic type' of an object is assigned (I'd appreciate any further references on what the dynamic type is of a char array, to which a POD object has been copied as a char array with proper alignment). As such, aliasing is not a problem here. However, a strict reading of the standard indicates that a C++ implementation has a great deal of freedom in choosing a representation of an <code>unsigned int</code>. As one random example, <code>unsigned int</code>s might be a 24-bit integer, represented in four bytes, with 8 padding bits interspersed; if any of these padding bits does not match a certain (constant) pattern, it is viewed as a trap representation, and dereferencing the pointer will result in a crash. Is this a likely implementation? Perhaps not. But there have been, historically, systems with parity bits and other oddness, and so directly reading from the network into an <code>unsigned int</code>, by a strict reading of the standard, is not kosher. Now, the problem of padding bits is mostly a theoretical issue on most systems today, but it's worth noting. If you plan to stick to PC hardware, you don't really need to worry about it (but don't forget your <code>ntohl</code>s - endianness is still a problem!) Structures make it even worse, of course - alignment representations depend on your platform. I have worked on an embedded platform in which all types have an alignment of 1 - no padding is ever inserted into structures. This can result in inconsistencies when using the same structure definitions on multiple platforms. You can either manually work out the byte offsets for data structure members and reference them directly, or use a compiler-specific alignment directive to control padding. So you must be careful when directly casting from a network buffer to native types or structures. But the aliasing itself is not a problem in this case.

Do I understand C/C++ strict-aliasing correctly?

Tags:

c++

c

strict-aliasing

I've read this article about C/C++ strict aliasing. I think the same applies to C++.

As I understand, strict aliasing is used to rearrange the code for performance optimization. That's why two pointers of different (and unrelated in C++ case) types cannot refer to the same memory location.

Does this mean that problems can occur only if memory is modified? Apart of possible problems with memory alignment.

For example, handling network protocol, or de-serialization. I have a byte array, dynamically allocated and packet struct is properly aligned. Can I reinterpret_cast it to my packet struct?

char const* buf = ...; // dynamically allocated
unsigned int i = *reinterpret_cast<unsigned int*>(buf + shift); // [shift] satisfies alignment requirements

506

asked Sep 06 '11 14:09

Andriy Tylychko

1 Answers

The problem here is not strict aliasing so much as structure representation requirements.

First, it is safe to alias between char, signed char, or unsigned char and any one other type (in your case, unsigned int. This allows you to write your own memory-copy loops, as long as they're defined using a char type. This is authorized by the following language in C99 (§6.5):

6. The effective type of an object for an access to its stored value is the declared type of the object, if any. [Footnote: Allocated objects have no declared type] [...] If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [Footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]

a type compatible with the effective type of the object,

[...]

a character type.

Similar language can be found in the C++0x draft N3242 §3.11/10, although it is not as clear when the 'dynamic type' of an object is assigned (I'd appreciate any further references on what the dynamic type is of a char array, to which a POD object has been copied as a char array with proper alignment).

As such, aliasing is not a problem here. However, a strict reading of the standard indicates that a C++ implementation has a great deal of freedom in choosing a representation of an unsigned int.

As one random example, unsigned ints might be a 24-bit integer, represented in four bytes, with 8 padding bits interspersed; if any of these padding bits does not match a certain (constant) pattern, it is viewed as a trap representation, and dereferencing the pointer will result in a crash. Is this a likely implementation? Perhaps not. But there have been, historically, systems with parity bits and other oddness, and so directly reading from the network into an unsigned int, by a strict reading of the standard, is not kosher.

Now, the problem of padding bits is mostly a theoretical issue on most systems today, but it's worth noting. If you plan to stick to PC hardware, you don't really need to worry about it (but don't forget your ntohls - endianness is still a problem!)

Structures make it even worse, of course - alignment representations depend on your platform. I have worked on an embedded platform in which all types have an alignment of 1 - no padding is ever inserted into structures. This can result in inconsistencies when using the same structure definitions on multiple platforms. You can either manually work out the byte offsets for data structure members and reference them directly, or use a compiler-specific alignment directive to control padding.

So you must be careful when directly casting from a network buffer to native types or structures. But the aliasing itself is not a problem in this case.

answered Sep 23 '22 12:09

bdonlan

Related questions
                            
                                Constant floats with SIMD
                            
                                Repaint issues when switching between programs
                            
                                CMake linking to boost. error LNK2005
                            
                                Strange behaviour with operator[]
                            
                                What is the initialization order for static data members of template class in a file?
                            
                                istream::tellg() returns -1 when used with my custom streambuf class?
                            
                                SDL / OpenGL: Implementing a "Loading thread"
                            
                                How to generate a vector with unique values?
                            
                                Best practice for dependencies on #defines?
                            
                                Read hex text format 0x from stream
                            
                                C++, ambiguous inheritance error in vs 2010
                            
                                C++ Getters-Setters in Implementation File
                            
                                Find an efficient way to integrate different language libraries into one project using Python as the "glue"
                            
                                Build a binary tree from an infix expression without using a stack
                            
                                How can a Solaris process read its own symbol table?
                            
                                partial template specialization
                            
                                "id" function in C++0x
                            
                                Discarding data with boost::asio
                            
                                Specify default extension in QFileDialog::getSaveFileName
                            
                                Does dynamic_cast really work for multiple inheritance?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With