I am having a really weird bug with Intel Intrinsics on an AVX2 function, which I would want to share here. Either it is me doing something wrong (I cannot really see what at this point), or a bug in the library.
I have this simple code inside my main.c:
__int64 test = 0xFFFF'FFFF'FFFF'FFFF;
__m256i ymm = _mm256_set_epi64x(0x0000'0000'0000'0000,
0x0000'0000'0000'0000,
0x0000'0000'0000'0000,
test);
The value that gets assigned to variable ymm is for some strange reason:
ymm.m256i_i64[0] = 0xffff'ffff'ffff'ffff
ymm.m256i_i64[1] = 0x0000'0000'0000'0000
ymm.m256i_i64[2] = 0x0000'ffff'0000'0000
ymm.m256i_i64[3] = 0x0000'0000'0000'0000
I have been debugging for hours at this point, but cannot see why ymm.m256i_i64[2]
gets this rogue value. Please help!
Fun/weird fact: If I write this C-code:
__m256i ymm = _mm256_set_epi64x(0x0000'0000'0000'0000,
0x0000'0000'0000'0000,
0x0000'0000'0000'0000,
0xFFFF'FFFF'FFFF'FFFF);
Then the values get correctly set to:
ymm.m256i_i64[0] = 0xffff'ffff'ffff'ffff
ymm.m256i_i64[1] = 0x0000'0000'0000'0000
ymm.m256i_i64[2] = 0x0000'0000'0000'0000
ymm.m256i_i64[3] = 0x0000'0000'0000'0000
Note: I am using Visual Studio; both their compiler and their debugging tools, as below example picture shows:
The printf following the code printed: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ff ff ff 00 ff ff 00 00 ff 00 00 00 ff 00 00 00
.
It seems that the rogue changes in the other variables in the struct can change, since they are not the same after I added the loop, as they were before... (I don't know if the loop specifically made the change).
Edit: I am no hawk to assembly.... Not at all. I added the generated assembly-code though in the picture below, in case that can help anyone to help me understand what's going on, and if it is a bug not caused by me:
MSVC until recently did not support any of the epi64x
intrinsics in 32-bit mode. In Agner Fog's VCL library he writes
//#if defined (_MSC_VER) && _MSC_VER < 1900 && ! defined (__x86_64__) && ! defined(__INTEL_COMPILER)
// MS compiler cannot use _mm256_set1_epi64x in 32 bit mode, and
// cannot put 64-bit values into xmm register without using
// mmx registers, and it makes no emms
To work around this in 32-bit mode with MSVC you can do this:
union {
int64_t q[4];
int32_t r[8];
} u;
u.q[0] = a; u.q[1] = b; u.q[2] = c; u.q[3] = d;
_mm256_setr_epi32(u.r[0], u.r[1], u.r[2], u.r[3], u.r[4], u.r[5], u.r[6], u.r[7]);
Or use 64-bit mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With