Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing C union members via pointers

Tags:

c

pointers

unions

Does accessing union members via a pointer, as in the example below, result in undefined behavior in C99? The intent seems clear enough, but I know that there are some restrictions regarding aliasing and unions.

union { int i; char c; } u;

int  *ip = &u.i;
char *ic = &u.c;

*ip = 0;
*ic = 'a';
printf("%c\n", u.c);
like image 781
Dara Hazeghi Avatar asked May 29 '13 02:05

Dara Hazeghi


People also ask

How do you access members of the union of pointer type?

At any time only one variable can be referred. Same syntax of structure is used to access a union member. The dot operator is for accessing members. The arrow operator ( ->) is used for accessing the members using pointer.

Can pointers be used for unions?

You can use any data type in a union, there's no restriction.

How do you access a structure variable using pointer?

To access members of a structure using pointers, we use the -> operator. In this example, the address of person1 is stored in the personPtr pointer using personPtr = &person1; . Now, you can access the members of person1 using the personPtr pointer.

What is pointer to union?

A pointer to a union can be cast to a pointer to each of its members (if a union has bit field members, the pointer to a union can be cast to the pointer to the bit field's underlying type). Likewise, a pointer to any member of a union can be cast to a pointer to the enclosing union.


2 Answers

It is unspecified (subtly different from undefined) behaviour to access a union by any element other than the one that was last written. That's detailed in C99 annex J:

The following are unspecified:
   :
   The value of a union member other than the last one stored into (6.2.6.1).

However, since you are writing to c via the pointer, then reading c, this particular example is well defined. It does not matter how you write to the element:

u.c = 'a';        // direct write.
*(&(u.c)) = 'a';  // variation on yours, writing through element pointer.
(&u)->c = 'a';    // writing through structure pointer.

There is one issue that has been raised in comments which seems to contradict that, at least seemingly. User davmac provides sample code:

// Compile with "-O3 -std=c99" eg:
//  clang -O3 -std=c99 test.c
//  gcc -O3 -std=c99 test.c
// On clang v3.5.1, output is "123"
// On gcc 4.8.4, output is "1073741824"
//
// Different outputs, so either:
// * program invokes undefined behaviour; both compilers are correct OR
// * compiler vendors interpret standard differently OR
// * one compiler or the other has a bug

#include <stdio.h>

union u
{
    int i;
    float f;
};

int someFunc(union u * up, float *fp)
{
    up->i = 123;
    *fp = 2.0;     // does this set the union member?
    return up->i;  // then this should not return 123!
}

int main(int argc, char **argv)
{
    union u uobj;
    printf("%d\n", someFunc(&uobj, &uobj.f));
    return 0;
}

which outputs different values on different compilers. However, I believe that this is because it is actually violating the rules here because it writes to member f then reads member i and, as shown in Annex J, that's unspecified.

There is a footnote 82 in 6.5.2.3 which states:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type.

However, since this seems to go against the Annex J comment and it's a footnote to the section dealing with expressions of the form x.y, it may not apply to accesses via a pointer.

One of the major reasons why aliasing is supposed to be strict is to allow the compiler more scope for optimisation. To that end, the standard dictates that treating memory of a different type to that written is unspecified.

By way of example, consider the function provided:

int someFunc(union u * up, float *fp)
{
    up->i = 123;
    *fp = 2.0;     // does this set the union member?
    return up->i;  // then this should not return 123!
}

The implementation is free to assume that, because you're not supposed to alias memory, up->i and *fp are two distinct objects. So it's free to assume that you're not changing the value of up->i after you set it to 123 so it can simply return 123 without looking at the actual variable contents again.

If instead, you changed the pointer setting statement to:

up->f = 2.0;

then that would make footnote 82 applicable and the returned value would be a re-interpretation of the float as an integer.

The reason why I don't think that's an issue for the question is because your writing then reading the same type, hence aliasing rules don't come into play.


It's interesting to note that the unspecified behaviour is caused not by the function itself, but by calling it thus:

union u up;
int x = someFunc (&u, &(up.f)); // <- aliasing here

If you were instead to call it so:

union u up;
float down;
int x = someFunc (&u, &down); // <- no aliasing

that would not be a problem.

like image 91
paxdiablo Avatar answered Sep 27 '22 23:09

paxdiablo


No, it won't but you need to keep track of what the last type you put into the union was. If I were to reverse the order of your int and char assignments it would be a very different story:

#include <stdio.h>

union { int i; char c; } u;

int main()
{
    int  *ip = &u.i;
    char *ic = &u.c;

    *ic = 'a';
    *ip = 123456;

    printf("%c\n", u.c); /* trying to print a char even though 
                            it's currently storing an int,
                            in this case it prints '@' on my machine */

    return 0;
}

EDIT: Some explanation on why it may have printed 64 ('@').

The binary representation of 123456 is 0001 1110 0010 0100 0000.

For 64 it is 0100 0000.

You can see that the first 8 bits are identical and since printf is instructed to read the first 8 bits, it prints only as much.

like image 34
Nobilis Avatar answered Sep 28 '22 01:09

Nobilis