I wrote very simple program with AVX instructions, but I am getting different results when I compile the code with -O3 option and -O1 options of g++ compiler, this is my code:
int main(int argc, char *argv[])
{
int d = 120;
__m256i r = _mm256_set1_epi32(d);
int * p = (int *) &r;
printf("r[0]: %d, ",p[0]);
printf("r[1]: %d, ",p[1]);
printf("r[2]: %d, ",p[2]);
printf("r[3]: %d, ",p[3]);
printf("r[4]: %d, ",p[4]);
printf("r[5]: %d, ",p[5]);
printf("r[6]: %d, ",p[6]);
printf("r[7]: %d \n",p[7]);
return 0;
}
This is the output when I compile with these options (g++ test1.c -o test1 -m64 -O3 -ffast-math -march=native -mavx):
r[0]: 0, r[1]: 0, r[2]: 4195520, r[3]: 0, r[4]: -1880829792, r[5]: 32767, r[6]: 0, r[7]: 0
And this is the output when I compile with these options (g++ test1.c -o test1 -m64 -O1 -ffast-math -march=native -mavx):
r[0]: 120, r[1]: 120, r[2]: 120, r[3]: 120, r[4]: 120, r[5]: 120, r[6]: 120, r[7]: 120
The second results (-O1) is correct, but the first is wrong. I don't know why this is happening.
Disabling strict aliasing will reduce performance in your whole program!
Casting &r
to (int*)
has no defined behavior. __m256i r
is an AVX register intrinsic and is not necessarily mapped to memory. By getting a pointer onto it, you force the compiler to write it to memory, and by chance it may end up being mapped to a int[8] vector.
It may work with some compilers, with some options, and under some circumstances. However, you should not use this in your code as it may stop working with no warning.
The "defined behavior" way is:
int[8] p;
_mm256_storeu_si128((__m256i*)p, r);
printf("r[0]: %d, ",p[0]);
printf("r[1]: %d, ",p[1]);
printf("r[2]: %d, ",p[2]);
printf("r[3]: %d, ",p[3]);
printf("r[4]: %d, ",p[4]);
printf("r[5]: %d, ",p[5]);
printf("r[6]: %d, ",p[6]);
printf("r[7]: %d \n",p[7]);
Then you explicitely write the register to memory. This will do the same, but will always work regardless of compiler options. And since disabling strict aliasing will lower the overall code optimization, your whole program will even run faster.
I just read your comment saying you already fixed the problem, but on the search engine it still shows up as "no answer", which is a bit misleading to people with similar issues. The original answer that was here was actually wrong, but the original poster hasn't changed the accepted answer to the right one yet, so I'll update this one.
The short answer is that casting &r
to (int*)
has no defined behaviour. Refer to galinette's answer for more details.
The defined behaviour way to do this is to explicitly write the register to memory:
int[8] p;
_mm256_storeu_si128((__m256i*)p, r);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With