Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is strict aliasing one-way?

I believe 6.5p7 in the C standard defines the so-called strict aliasing rule as follows.

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  1. a type compatible with the effective type of the object,
  2. a qualified version of a type compatible with the effective type of the object,
  3. a type that is the signed or unsigned type corresponding to the effective type of the object,
  4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  6. a character type.

Here's a simple example that shows GCC's optimization based on its assumption to the rule.

int IF(int *i, float *f) {
    *i = -1;
    *f = 0;
    return *i;
}

IF:
        mov     DWORD PTR [rdi], -1
        mov     eax, -1
        mov     DWORD PTR [rsi], 0x00000000
        ret

The load for return *i is omitted assuming that int and float cannot alias.

Then let's consider case 6, where it says an object could be accessed by a character type lvalue expression (char *).

int IC(int *i, char *c) {
    *i = -1;
    *c = 0;
    return *i;
}

IC:
        mov     DWORD PTR [rdi], -1
        mov     BYTE PTR [rsi], 0
        mov     eax, DWORD PTR [rdi]
        ret

Now there is a load for return *i because i and c could overlap according to the rules, and *c = 0 could change what's in *i.

Then can we also modify a char through an int *? Should the compiler care that such thing might happen?

char CI(char *c, int *i) {
    *c = -1;
    *i = 0;
    return *c;
}

CI: #GCC
        mov     BYTE PTR [rdi], -1
        mov     DWORD PTR [rsi], 0
        movzx   eax, BYTE PTR [rdi]
        ret

CI: #Clang
        mov     byte ptr [rdi], -1
        mov     dword ptr [rsi], 0
        mov     al, byte ptr [rdi]
        ret

Looking at the assembly output, both GCC and Clang seem to think a char can be modified by access through int *.

Maybe it's obvious that A and B overlapping means A overlaps B and B overlaps A. However, I found this detailed answer which emphasizes in boldface that,

Note that may_alias, like the char* aliasing rule, only goes one way: it is not guaranteed to be safe to use int32_t* to read a __m256. It might not even be safe to use float* to read a __m256. Just like it's not safe to do char buf[1024]; int *p = (int*)buf;.

Now I got really confused. The answer is also about GCC vector types, which has an may_alias attribute so it can alias similarly as a char.

At least, in the following example, GCC seems to think overlapping access can happen in both ways.

int IV(int *i, __m128i *v) {
    *i = -1;
    *v = _mm_setzero_si128();
    return *i;
}

__m128i VI(int *i, __m128i *v) {
    *v = _mm_set1_epi32(-1);
    *i = 0;
    return *v;
}

IV:
        pxor    xmm0, xmm0
        mov     DWORD PTR [rdi], -1
        movaps  XMMWORD PTR [rsi], xmm0
        mov     eax, DWORD PTR [rdi]
        ret
VI:
        pcmpeqd xmm0, xmm0
        movaps  XMMWORD PTR [rsi], xmm0
        mov     DWORD PTR [rdi], 0
        movdqa  xmm0, XMMWORD PTR [rsi]
        ret

https://godbolt.org/z/ab5EMx3bb

But am I missing something? Is strict aliasing one-way?


Additionally, after reading the current answers and comments, I thought maybe this code is not allowed by the standard.

typedef struct {int i;} S;
S s;
int *p = (int *)&s;
*p = 1;

Note that (int *)&s is different from &s.i. My current interpretation is that an object of type S is being accessed by an lvalue expression of type int, and this case is not listed in 6.5p7.

like image 806
xiver77 Avatar asked Nov 16 '25 23:11

xiver77


2 Answers

Yes it's only one way, but from the context of the function it can't tell from which side.

Given this:

char CI(char *c, int *i) {
    *c = -1;
    *i = 0;
    return *c;
}

It could have been called like this:

int a;
char *p = ((char *)&a) + 1;
char b = CI(p,&a);

Which is a valid use of aliasing. So from inside of the function, *i = 0 is correctly setting a in the calling function, and *c = -1 is correctly setting one byte inside of a.

like image 165
dbush Avatar answered Nov 18 '25 12:11

dbush


You can take a pointer to any object, cast it to a char* and use that to access the bit patterns underlying said object. You can also cast char* gotten this way back to it's original type.

So when the compiler sees int *i and char *p it can not exclude the possibility that p was created by casting from i. So they may point to the same raw memory. Changing one may change the other. There it goes both ways. But that is not what the text is about.

What this is about is casting from A* to char* and then to B*. The object pointed to doesn't magically become a B and accessing it through a B* is undefined behavior. Maybe one-way is the wrong word. I don't know what to name this better. But for every object there is a train with only 2 stops: A* and char* (unsigned char*, signed char*, const char*, ... and all it's variants). You can go back and forth as many times as you like but you can never change tracks and go to B*.

Does that help?

The may_alias attribute sets up another such rail system. Allowing the alias between int[4] and __m128i* because that is exactly the overlapping the compiler needs for the vectorization. But that's something you have to look up in the compilers specs.

like image 38
Goswin von Brederlow Avatar answered Nov 18 '25 14:11

Goswin von Brederlow



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!