I am currently wondering about the rationale behind the strict aliasing rule. I understand that certain aliasing is not allowed in C and that the intention is to allow optimizations, but I am surprised that this was the preferred solution over tracing type casts when the standard was defined.
So, apparently the following example violates the strict aliasing rule:
uint64_t swap(uint64_t val)
{
uint64_t copy = val;
uint32_t *ptr = (uint32_t*)© // strict aliasing violation
uint32_t tmp = ptr[0];
ptr[0] = ptr[1];
ptr[1] = tmp;
return copy;
}
I might be wrong, but as far as I can see a compiler should perfectly and trivially be able to trace down the type casts and avoid optimizations on types which are casted explicitly (just like it avoids such optimizations on same-type pointers) on anything called with the affected values.
So, which problems with the strict aliasing rule did I miss that a compiler can't solve easily to automatically detect possible optimizations)?
GCC compiler makes an assumption that pointers of different types will never point to the same memory location i.e., alias of each other. Strict aliasing rule helps the compiler to optimize the code.
The answer typically is to type pun, often the methods used violate strict aliasing rules. Sometimes we want to circumvent the type system and interpret an object as a different type. This is called type punning, to reinterpret a segment of memory as another type.
An alias occurs when different variables point directly or indirectly to a single area of storage. Aliasing refers to assumptions made during optimization about which variables can point to or occupy the same storage area.
Because any pointer could alias any other pointer in C, the compiler must assume that memory regions accessed through these pointers can overlap, which prevents many possible optimizations. C++ enables more optimizations, as pointer arguments will not be treated as possible aliases if they point to different types.
Since, in this example, all the code is visible to a compiler, a compiler can, hypothetically, determine what is requested and generate the desired assembly code. However, demonstration of one situation in which a strict aliasing rule is not theoretically needed does nothing to prove there are not other situations where it is needed.
Consider if the code instead contains:
foo(&val, ptr)
where the declaration of foo
is void foo(uint64_t *a, uint32_t *b);
. Then, inside foo
, which may be in another translation unit, the compiler would have no way of knowing that a
and b
point to (parts of) the same object.
Then there are two choices: One, the language may permit aliasing, in which case the compiler, while translating foo
, cannot make optimizations relying on the fact that *a
and *b
are different. For example, whenever something is written to *b
, the compiler must generate assembly code to reload *a
, since it may have changed. Optimizations such as keeping a copy of *a
in registers while working with it would not be allowed.
The second choice, two, is to prohibit aliasing (specifically, not to define the behavior if a program does it). In this case, the compiler can make optimizations relying on the fact that *a
and *b
are different.
The C committee chose option two because it offers better performance while not unduly restricting programmers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With