Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the strict aliasing rule incorrectly specified?

As previously established, a union of the form

union some_union {     type_a member_a;     type_b member_b;     ... }; 

with n members comprises n + 1 objects in overlapping storage: One object for the union itself and one object for each union member. It is clear, that you may freely read and write to any union member in any order, even if reading a union member that was not the last one written to. The strict aliasing rule is never violated, as the lvalue through which you access the storage has the correct effective type.

This is further supported by footnote 95, which explains how type punning is an intended use of unions.

A typical example of the optimizations enabled by the strict aliasing rule is this function:

int strict_aliasing_example(int *i, float *f) {     *i = 1;     *f = 1.0;     return (*i); } 

which the compiler may optimize to something like

int strict_aliasing_example(int *i, float *f) {     *i = 1;     *f = 1.0;     return (1); } 

because it can safely assume that the write to *f does not affect the value of *i.

However, what happens when we pass two pointers to members of the same union? Consider this example, assuming a typical platform where float is an IEEE 754 single precision floating point number and int is a 32 bit two's complement integer:

int breaking_example(void) {     union {         int i;         float f;     } fi;      return (strict_aliasing_example(&fi.i, &fi.f)); } 

As previously established, fi.i and fi.f refer to an overlapping memory region. Reading and writing them is unconditionally legal (writing is only legal once the union has been initialized) in any order. In my opinion, the previously discussed optimization performed by all major compilers yields incorrect code as the two pointers of different type legally point to the same location.

I somehow can't believe that my interpretation of the strict aliasing rule is correct. It doesn't seem plausible that the very optimization the strict aliasing was designed for is not possible due to the aforementioned corner case.

Please tell me why I'm wrong.

A related question turned up during research.

Please read all existing answers and their comments before adding your own to make sure that your answer adds a new argument.

like image 444
fuz Avatar asked Aug 05 '16 21:08

fuz


People also ask

What is the strict aliasing rule?

The strict aliasing rule dictates that pointers are assumed not to alias if they point to fundamentally different types, except for char* and void* which can alias to any other data type.

What is the problem of aliasing while using pointers?

Because any pointer could alias any other pointer in C, the compiler must assume that memory regions accessed through these pointers can overlap, which prevents many possible optimizations. C++ enables more optimizations, as pointer arguments will not be treated as possible aliases if they point to different types.

What is aliasing in C++?

In C, C++, and some other programming languages, the term aliasing refers to a situation where two different expressions or symbols refer to the same object.

What is aliasing in programming?

An alias occurs when different variables point directly or indirectly to a single area of storage. Aliasing refers to assumptions made during optimization about which variables can point to or occupy the same storage area.


2 Answers

Starting with your example:

int strict_aliasing_example(int *i, float *f) {     *i = 1;     *f = 1.0;     return (*i); } 

Let's first acknowledge that, in the absence of any unions, this would violate the strict aliasing rule if i and f both point to the same object; assuming the object has no declared type, then *i = 1 sets the effective type to int and *f = 1.0 then sets it to float, and the final return (*i) then accesses an object with effective type of float via an lvalue of type int, which is clearly not allowed.

The question is about whether this would still amount to a strict-aliasing violation if both i and f point to members of the same union. For this not to be the case, it would either have to be that there is some special exemption from the strict aliasing rule that applies in this situation, or that accessing the object via *i does not (also) access the same object as *f.

On union member access via the "." member access operator, the standard says (6.5.2.3):

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (95) and is an lvalue if the first expression is an lvalue.

The footnote 95 referred to in above says:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

This is clearly intended to allow type punning via a union, but it should be noted that (1) footnotes are non-normative, that is, they are not supposed to proscribe behaviour, but rather they should clarify the intention of some part of the text in accordance with the rest of the specification, and (2) this allowance for type punning via a union is deemed by compiler vendors as applying only for access via the union member access operator - since otherwise strict aliasing is pretty useless for optimisation, as just about any two pointers potentially refer to different members of the same union (your example is a case in point).

So at this point, we can say that:

  • the code in your example is explicitly allowed by a non-normative footnote
  • the normative text on the other hand seems to disallow your example (due to strict aliasing), assuming that accessing one member of a union also constitutes access to another - but more on this shortly

Does accessing one member of a union actually access the others, though? If not, the strict aliasing rule isn't concerned with the example. (If it does, the strict aliasing rule, problematically, disallows just about any type-punning via a union).

A union is defined as (6.2.5 para 20):

A union type describes an overlapping nonempty set of member objects

And note that (6.7.2.1 para 16):

The value of at most one of the members can be stored in a union object at any time

Since access is (3):

〈execution-time action〉 to read or modify the value of an object

... and, since non-active union members do not have a stored value, then presumably accessing one member does not constitute access to the other members!

However, the definition of member access (6.5.2.3, quoted above) says "The value is that of the named member" (this is the precise statement that footnote 95 is attached to) - if the member has no value, what then? Footnote 95 gives an answer but as I've noted it is not supported by the normative text.

In any case, nothing in the text would seem to imply that reading or modifying a union member "via the member object" (i.e. directly via an expression using the member access operator) should be any different than reading or modifying it via pointer to that same member. The consensus understanding applied by compiler vendors, which allows them to perform optimisations under the assumption that pointers of different types do not alias, and that requires type punning be performed only via expressions involving member access, is not supported by the text of the standard.

If footnote 95 is considered normative, your example is perfectly fine code without undefined behaviour (unless the value of (*i) is a trap representation), according to the rest of the text. However, if footnote 95 is not considered normative, there is an attempted access to an object which has no stored value and the behaviour then is at best unclear (though the strict aliasing rule is arguably not relevant).

In the understanding of compiler vendors currently, your example has undefined behaviour, but since this isn't specified in the standard it's not clear exactly what constraint the code violates.

Personally, I think the "fix" to the standard is to:

  • disallow access to a non-active union member except via lvalue conversion of a member access expression, or via assignment where the left-hand-side is a member access expression (an exception to this could perhaps be made for when the member in question has character type, since that would not have an effect on possible optimisations due to a similar exception in the strict aliasing rule itself)
  • specify in the normative text that the value of a non-active member is as is currently described by footnote 95

That would make your example not a violation of the strict aliasing rule, but rather a violation of the constraint that a non-active union member must be accessed only via an expression containing the member access operator (and appropriate member).

Therefore, to answer your question - Is the strict aliasing rule incorrectly specified? - no, the strict aliasing rule is not relevant to this example because the objects accessed by the two pointer dereferences are separate objects and, even though they overlap in storage, only one of them has a value at a time. However, the union member access rules are incorrectly specified.

A note on Defect Report 236:

Arguments about union semantics invariably refer to DR 236 at some point. Indeed, your example code is superficially very similar to the code in that Defect Report. I would note that:

  1. The example in DR 236 is not about type-punning. It is about whether it is ok to assign to a non-active union member via a pointer to that member. The code in question is subtly different to that in the question here, since it does not attempt to access the "original" union member again after writing to the second member. Thus, despite the structural similarity in the example code, the Defect Report is largely unrelated to your question.
  2. "Committee believes that Example 2 violates the aliasing rules in 6.5 paragraph 7" - this indicates that the committee believes that writing a "non-active" union member, but not via an expression containing a member access of the union object, is a strict-aliasing violation. As I've detailed above, this is not supported by the text of the standard.
  3. "In order to not violate the rules, function f in example should be written as" - i.e. you must use the union object (and the "." operator) to change the active member type; this is in agreement with the "fix" to the standard I proposed above.
  4. The Committee Response in DR 236 claims that "Both programs invoke undefined behavior". It has no explanation for why the first does so, and its explanation for why the 2nd does so seems to be wrong.
like image 74
davmac Avatar answered Oct 15 '22 03:10

davmac


Under the definition of union members in §6.5.2.3:

3 A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. ...

4 A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. ...

See also §6.2.3 ¶1:

  • the members of structures or unions; each structure or union has a separate name space for its members (disambiguated by the type of the expression used to access the member via the . or -> operator);

It is clear that footnote 95 refers to the access of a union member with the union in scope and using the . or -> operator.

Since assignments and accesses to the bytes comprising the union are not made through union members but through pointers, your program does not invoke the aliasing rules of union members (including those clarified by footnote 95).

Further, normal aliasing rules are violated since the effective type of the object after *f = 1.0 is float, but its stored value is accessed by an lvalue of type int (see §6.5 ¶7).

Note: All references cite this C11 standard draft.

like image 37
Purag Avatar answered Oct 15 '22 05:10

Purag