Assuming I have a union like this
union buffer {
struct { T* data; int count; int capacity; };
struct { void* data; int count; int capacity; } __type_erased;
};
Will I get into trouble if I mix reads/writes to the anonymous struct members and __type_erased members under C11 aliasing rules?
More specifically, I am interested in the behaviour that occurs if the components are accessed independently (e.g. via different pointers). To illustrate:
grow_buffer(&buffer.__type_erased);
buffer.data[buffer.count] = ...
I have read all the relevant questions I could find, but I am still not 100% clear on this as some people seem to suggest that such behaviour is undefined while others say that it is legal. Furthermore, the information I find is a mix of C++, C99, C11 etc. rules that is quite difficult to digest. Here, I am interested explicitly in the behaviour mandated by C11 and exhibited by popular compilers (Clang, GCC)
I have now performed some experiments with multiple compilers and decided to share my findings in case someone runs into a similar issue. The background of my question is that I was trying to write a user-friendly high-performance generic dynamic array implementation in plain C. The idea is that array operation is carried out using macros and heavy-duty operations (like growing the array) are performed using an aliased type-erased template struct. E.g., I can have macro like this:
#define ALLOC_ONE(A)\
(_array_ensure_size(&A.__type_erased, A.count+1), A.count++)
that grows the array if necessary and returns an index of the newly allocated item. The spec (6.5.2.3) states that access to the same location via different union members are allowed. My interpretation of this is that while _array_ensure_size() is not aware of the union type, the compiler should be aware that the member __type_erased can be potentially mutated by a side effect. That is, I'd assume that this should work. However, it seems that this is a grey zone (and to be honest, the spec is really not clear of what constitutes a member access). Apple's latest Clang (clang-800.0.33.1) has no problems with it. The code compiles without warnings and runs as expected. However, when compiled with GCC 5.3.0 the code crashes with a segfault. In fact, I have a strong suspicion that GCC's behaviour is a bug — I tried making union member mutation explicit by removing the mutable pointer ref and adopting a clear functional style, e.g.:
#define ALLOC_ONE(A) \
(A.__type_erased = _array_ensure_size(A.__type_erased, A.count+1),\
A.count++)
This again works with Clang, as expected, but crashes GCC again. My conclusion is that advanced type manipulation with unions is a grey area where one should tread carefully.
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
GCC compiler makes an assumption that pointers of different types will never point to the same memory location i.e., alias of each other. Strict aliasing rule helps the compiler to optimize the code.
In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule.
Pointer aliasing is a hidden kind of data dependency that can occur in C, C++, or any other language that uses pointers for array addresses in arithmetic operations. Array data identified by pointers in C can overlap, because the C language puts very few restrictions on pointers.
The C11 standard says the following:
6.5.2.3 Structure and union members
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
So from the point of view of union field read/write in C11 it is correct. But strict-aliasing is type-based analysis, so its naive implementation can say these read/write operations to be independent. As I understand modern gcc can can detect cases with union fields and avoid such errors.
Aloso you should remember that there are some cases with pointers to union members that are invalid:
The following is not a valid fragment (because the union type is not visible within function f):
struct t1 { int m; }; struct t2 { int m; }; int f(struct t1 *p1, struct t2 *p2) { if (p1->m < 0) p2->m = -p2->m; return p1->m; } int g() { union { struct t1 s1; struct t2 s2; } u; /* ... */ return f(&u.s1, &u.s2); }
In my opinion using unions for reading/writing in different members is dangerous and it is better to aviod it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With