Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is assigning to a union member from a different member in the same union defined by the C standard?

Consider:

union { int i; char c; } x = {0};
x.c = x.i;

C 2018 6.15.16.1 3, about simple assignment, says:

If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.

x.c and x.i overlap but not exactly and do not have versions of a compatible type. So is the behavior of this assignment not defined by the C standard?

like image 959
Eric Postpischil Avatar asked Nov 30 '20 16:11

Eric Postpischil


Video Answer


1 Answers

In my opinion, you've found a flaw in the standard. I think the intent is that x.c = x.i; has undefined behavior, but that passage (which has been in the standard since C90) does not correctly express that intent.

That wording applies to simple assignment. In a simple assignment, the LHS must be an lvalue, and the RHS is just an expression. If the RHS happens to be an lvalue, it undergoes lvalue conversion as described in 6.3.2.1p2. After lvalue conversion, it is no longer an lvalue.

It would be reasonable to say that an assignment that copies one object to another object has undefined behavior when the objects overlap (unless the overlap is exact and the objects have compatible type). But assignment does not operate on two objects; it operates on an object (the LHS) and a non-lvalue expression (the RHS).

The passage says that the value "is read from another object". That's ambiguous. Must name of the object be the entire RHS expression, or can it be just a subexpression? If the latter, then x.c = x.i + 1; would have undefined behavior, which in my opinion would be absurd.

As written, the conditions given in the quoted passage cannot occur.

If the passage were to be corrected, it should apply only when the RHS is an lvalue (before lvalue conversion), and would discuss overlap between the object designated by the LHS and the object designated by the RHS.

The specific case here, assigning overlapping small integer objects, is not likely to cause problems in practice, but we could construct cases that are more problematic. For example:

int main() {
    struct big {
        int array[1000];
    };

    struct big_wrapper {
        int n;
        struct big b;
    };


    union u {
        struct big x;
        struct big_wrapper y;
    };

    union u obj;
    obj.x = obj.y.b;
}

Here the assignment copies the value of one large object into another large object, where the two objects overlap but do not start at the same location. To implement this correctly, a compiler would have to detect the overlap and perhaps generate a call to memmove() or equivalent. And the overlap might not be directly visible if the assignment is performed via pointers. Making the behavior undefined allows compilers to generate efficient code for the non-overlapping cases.

The intent, I believe, is that this assignment also has undefined behavior but again the lvalue conversion means that that intent is not clearly expressed.

In my opinion, that passage should be updated to state that it applies when the RHS is an lvalue, and that the relevant objects are the LHS and the pre-conversion RHS.

like image 77
Keith Thompson Avatar answered Oct 22 '22 18:10

Keith Thompson