We're running a scientific program and we would like to implement AVX features. The whole program (written in Fortran+C) is going to be vectorized and at the moment I'm trying to implement complex number multiplication within GCC inline assembly.
The assembly code takes 4 complex numbers and performs two complex multiplications at once:
v2complex cmult(v2complex *a, v2complex *b) {
v2complex ret;
asm (
"vmovupd %2,%%ymm1;"
"vmovupd %2, %%ymm2;"
"vmovddup %%ymm2, %%ymm2;"
"vshufpd $15,%%ymm1,%%ymm1,%%ymm1;"
"vmulpd %1, %%ymm2, %%ymm2;"
"vmulpd %1, %%ymm1, %%ymm1;"
"vshufpd $5,%%ymm1,%%ymm1, %%ymm1;"
"vaddsubpd %%ymm1, %%ymm2,%%ymm1;"
"vmovupd %%ymm1, %0;"
:
"=m"(ret)
:
"m" (*a),
"m" (*b)
);
return ret;
}
where a and b are 256-bit double precision:
typedef union v2complex {
__m256d v;
complex c[2];
} v2complex;
The problem is that that the code mostly produces the correct result, but sometimes it fails.
I am very new to assembly, but I tried to figure it out by myself. It seems that the C program (optimized -O3) interacts with the registers ymm
used in the assembly code. For instance, I can printf one of the values (e.g. a) before executing the multiplication and the program does never give wrong results.
My question is how to tell GCC not to interact with ymm. I did not manage to
put the ymm
to clobbered registers list.
As you surmise, the problem is that you haven’t told GCC which registers you are clobbering. I’m surprised if they don’t yet support putting YMM registers in the clobber list; what version of GCC are you using?
In any event, it will almost certainly suffice to put the corresponding XMM registers in the clobber list instead:
: "=m" (ret) : "m" (*a), "m" (*b) : "%xmm1", "%xmm2");
Some other notes:
"r" (a), "r" (b)
as constraints and write my loads like vmovupd (%2), %%ymm1
. Probably no difference in the generated code, but it seems more idiomatically correct.vzeroupper
following AVX code before any SSE code is executed to avoid (large) stalls.I add two comments, not directly answering your question:
src/special/complexvec.h
in the zipped source code.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With