How can *i
and u.i
print different numbers in this code, even though i
is defined as int *i = &u.i;
? I can only assuming that I'm triggering UB here, but I can't see how exactly.
(ideone demo replicates if I select 'C' as the language. But as @2501 pointed out, not if 'C99 strict' is the language. But then again, I get the problem with gcc-5.3.0 -std=c99
!)
// gcc -fstrict-aliasing -std=c99 -O2
union
{
int i;
short s;
} u;
int * i = &u.i;
short * s = &u.s;
int main()
{
*i = 2;
*s = 100;
printf(" *i = %d\n", *i); // prints 2
printf("u.i = %d\n", u.i); // prints 100
return 0;
}
(gcc 5.3.0, with -fstrict-aliasing -std=c99 -O2
, also with -std=c11
)
My theory is that 100
is the 'correct' answer, because the write to the union member through the short
-lvalue *s
is defined as such (for this platform/endianness/whatever). But I think that the optimizer doesn't realize that the write to *s
can alias u.i
, and therefore it thinks that *i=2;
is the only line that can affect *i
. Is this a reasonable theory?
If *s
can alias u.i
, and u.i
can alias *i
, then surely the compiler should think that *s
can alias *i
? Shouldn't aliasing be 'transitive'?
Finally, I always had this assumption that strict-aliasing problems were caused by bad casting. But there is no casting in this!
(My background is C++, I'm hoping I'm asking a reasonable question about C here. My (limited) understanding is that, in C99, it is acceptable to write through one union member and then reading through another member of a different type.)
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
Aliasing: Aliasing refers to the situation where the same memory location can be accessed using different names. For Example, if a function takes two pointers A and B which have the same value, then the name A[0] aliases the name B[0] i.e., we say the pointers A and B alias each other.
The disrepancy is issued by -fstrict-aliasing
optimization option. Its behavior and possible traps are described in GCC documentation:
Pay special attention to code like this:
union a_union { int i; double d; }; int f() { union a_union t; t.d = 3.0; return t.i; }
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with
-fstrict-aliasing
, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:int f() { union a_union t; int* ip; t.d = 3.0; ip = &t.i; return *ip; }
Note that conforming implementation is perfectly allowed to take advantage of this optimization, as second code example exhibits undefined behaviour. See Olaf's and others' answers for reference.
C standard (i.e. C11, n1570), 6.5p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- ...
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type.
The lvalue expressions of your pointers are not union
types, thus this exception does not apply. The compiler is correct exploiting this undefined behaviour.
Make the pointers' types pointers to the union
type and dereference with the respective member. That should work:
union {
...
} u, *i, *p;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With