Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting wrong results with using AVX instructions and -O3 compiling option

I wrote very simple program with AVX instructions, but I am getting different results when I compile the code with -O3 option and -O1 options of g++ compiler, this is my code:

int main(int argc, char *argv[])
{

    int d = 120;
    __m256i r = _mm256_set1_epi32(d);
    int * p = (int *) &r;

    printf("r[0]: %d, ",p[0]);
    printf("r[1]: %d, ",p[1]);
    printf("r[2]: %d, ",p[2]);
    printf("r[3]: %d, ",p[3]);
    printf("r[4]: %d, ",p[4]);
    printf("r[5]: %d, ",p[5]);
    printf("r[6]: %d, ",p[6]);
    printf("r[7]: %d \n",p[7]);                    

    return 0;
}

This is the output when I compile with these options (g++ test1.c -o test1 -m64 -O3 -ffast-math -march=native -mavx):

r[0]: 0, r[1]: 0, r[2]: 4195520, r[3]: 0, r[4]: -1880829792, r[5]: 32767, r[6]: 0, r[7]: 0

And this is the output when I compile with these options (g++ test1.c -o test1 -m64 -O1 -ffast-math -march=native -mavx):

r[0]: 120, r[1]: 120, r[2]: 120, r[3]: 120, r[4]: 120, r[5]: 120, r[6]: 120, r[7]: 120

The second results (-O1) is correct, but the first is wrong. I don't know why this is happening.

like image 866
user3687068 Avatar asked Mar 17 '23 14:03

user3687068


2 Answers

Disabling strict aliasing will reduce performance in your whole program!

Casting &r to (int*) has no defined behavior. __m256i r is an AVX register intrinsic and is not necessarily mapped to memory. By getting a pointer onto it, you force the compiler to write it to memory, and by chance it may end up being mapped to a int[8] vector.

It may work with some compilers, with some options, and under some circumstances. However, you should not use this in your code as it may stop working with no warning.

The "defined behavior" way is:

int[8] p;
_mm256_storeu_si128((__m256i*)p, r);
printf("r[0]: %d, ",p[0]);
printf("r[1]: %d, ",p[1]);
printf("r[2]: %d, ",p[2]);
printf("r[3]: %d, ",p[3]);
printf("r[4]: %d, ",p[4]);
printf("r[5]: %d, ",p[5]);
printf("r[6]: %d, ",p[6]);
printf("r[7]: %d \n",p[7]); 

Then you explicitely write the register to memory. This will do the same, but will always work regardless of compiler options. And since disabling strict aliasing will lower the overall code optimization, your whole program will even run faster.

like image 100
galinette Avatar answered Mar 19 '23 05:03

galinette


I just read your comment saying you already fixed the problem, but on the search engine it still shows up as "no answer", which is a bit misleading to people with similar issues. The original answer that was here was actually wrong, but the original poster hasn't changed the accepted answer to the right one yet, so I'll update this one.

The short answer is that casting &r to (int*) has no defined behaviour. Refer to galinette's answer for more details.

The defined behaviour way to do this is to explicitly write the register to memory:

int[8] p;
_mm256_storeu_si128((__m256i*)p, r);
like image 28
Louis Avatar answered Mar 19 '23 03:03

Louis