Does accessing union members via a pointer, as in the example below, result in undefined behavior in C99? The intent seems clear enough, but I know that there are some restrictions regarding aliasing and unions.
union { int i; char c; } u;
int *ip = &u.i;
char *ic = &u.c;
*ip = 0;
*ic = 'a';
printf("%c\n", u.c);
At any time only one variable can be referred. Same syntax of structure is used to access a union member. The dot operator is for accessing members. The arrow operator ( ->) is used for accessing the members using pointer.
You can use any data type in a union, there's no restriction.
To access members of a structure using pointers, we use the -> operator. In this example, the address of person1 is stored in the personPtr pointer using personPtr = &person1; . Now, you can access the members of person1 using the personPtr pointer.
A pointer to a union can be cast to a pointer to each of its members (if a union has bit field members, the pointer to a union can be cast to the pointer to the bit field's underlying type). Likewise, a pointer to any member of a union can be cast to a pointer to the enclosing union.
It is unspecified (subtly different from undefined) behaviour to access a union by any element other than the one that was last written. That's detailed in C99 annex J:
The following are unspecified:
:
The value of a union member other than the last one stored into (6.2.6.1).
However, since you are writing to c
via the pointer, then reading c
, this particular example is well defined. It does not matter how you write to the element:
u.c = 'a'; // direct write.
*(&(u.c)) = 'a'; // variation on yours, writing through element pointer.
(&u)->c = 'a'; // writing through structure pointer.
There is one issue that has been raised in comments which seems to contradict that, at least seemingly. User davmac
provides sample code:
// Compile with "-O3 -std=c99" eg:
// clang -O3 -std=c99 test.c
// gcc -O3 -std=c99 test.c
// On clang v3.5.1, output is "123"
// On gcc 4.8.4, output is "1073741824"
//
// Different outputs, so either:
// * program invokes undefined behaviour; both compilers are correct OR
// * compiler vendors interpret standard differently OR
// * one compiler or the other has a bug
#include <stdio.h>
union u
{
int i;
float f;
};
int someFunc(union u * up, float *fp)
{
up->i = 123;
*fp = 2.0; // does this set the union member?
return up->i; // then this should not return 123!
}
int main(int argc, char **argv)
{
union u uobj;
printf("%d\n", someFunc(&uobj, &uobj.f));
return 0;
}
which outputs different values on different compilers. However, I believe that this is because it is actually violating the rules here because it writes to member f
then reads member i
and, as shown in Annex J, that's unspecified.
There is a footnote 82 in 6.5.2.3
which states:
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type.
However, since this seems to go against the Annex J comment and it's a footnote to the section dealing with expressions of the form x.y
, it may not apply to accesses via a pointer.
One of the major reasons why aliasing is supposed to be strict is to allow the compiler more scope for optimisation. To that end, the standard dictates that treating memory of a different type to that written is unspecified.
By way of example, consider the function provided:
int someFunc(union u * up, float *fp)
{
up->i = 123;
*fp = 2.0; // does this set the union member?
return up->i; // then this should not return 123!
}
The implementation is free to assume that, because you're not supposed to alias memory, up->i
and *fp
are two distinct objects. So it's free to assume that you're not changing the value of up->i
after you set it to 123
so it can simply return 123
without looking at the actual variable contents again.
If instead, you changed the pointer setting statement to:
up->f = 2.0;
then that would make footnote 82 applicable and the returned value would be a re-interpretation of the float as an integer.
The reason why I don't think that's an issue for the question is because your writing then reading the same type, hence aliasing rules don't come into play.
It's interesting to note that the unspecified behaviour is caused not by the function itself, but by calling it thus:
union u up;
int x = someFunc (&u, &(up.f)); // <- aliasing here
If you were instead to call it so:
union u up;
float down;
int x = someFunc (&u, &down); // <- no aliasing
that would not be a problem.
No, it won't but you need to keep track of what the last type you put into the union was. If I were to reverse the order of your int
and char
assignments it would be a very different story:
#include <stdio.h>
union { int i; char c; } u;
int main()
{
int *ip = &u.i;
char *ic = &u.c;
*ic = 'a';
*ip = 123456;
printf("%c\n", u.c); /* trying to print a char even though
it's currently storing an int,
in this case it prints '@' on my machine */
return 0;
}
EDIT: Some explanation on why it may have printed 64 ('@').
The binary representation of 123456 is 0001 1110 0010 0100 0000.
For 64 it is 0100 0000.
You can see that the first 8 bits are identical and since printf
is instructed to read the first 8 bits, it prints only as much.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With