MSVC and ICC both support the intrinsics _addcarry_u64
and _addcarryx_u64
.
According to Intel's Intrinsic Guide and white paper these should map to adcx
and adox
respectively. However, by looking at the generated assembly it's clear they map to adc
and adcx
respectively and there is no intrinsic which maps to adox
.
Additionally, telling the compiler to enable AVX2 with
I'm not sure how to enable ADX with MSVC and ICC./arch:AVX2
in MSVC or -march=core-avx2
with ICC on Linux makes no difference.
The documentation for MSVC lists _addcarryx_u64
with the technology of ADX whereas _addcarry_u64
has no listed technology. However, the link in MSVC's documentation for these intrinsics goes directly to the Intel Intrinsic guide which contradicts MSVC's own documentation and the generated assembly.
From this I conclude that Intel's Intrinsic guide and white paper are wrong.
This makes some sense for MSVC sense it does not allow inline assembly it should provide a way to use adc
which it does with _addcarry_u64
.
One of the big advantages of adcx
and adox
is that they operate on different flags (carry CF
and overflow OF
) and this allows two independent parallel carry chains. However, since there is no intrinsic for adox
how is this possible? With ICC at least one can use inline assembly but this is not possible with MSVC in 64-bit mode.
Microsoft and Intel's documentation (both the white paper and the intrinsic guide online) both agree now.
The _addcarry_u64
intrinsic documentation says produces only adc
. The _addcarryx_u64
intrinsic can produce either adcx
or adox
. With MSVC 2013 and 2015, however, _addcarryx_u64
only produces adcx
. ICC produces both.
They map to adc
, adcx
AND adox
. The compiler decides which instructions to use, based on how you use them. If you perform two big-int additions in parallel the compiler will use adcx
and adox
, for higher throughput. For example:
unsigned char c1 = 0, c2 = 0
for(i=0; i< 100; i++){
c1 = _addcarry_u64(c1, res[i], a[i], &res[i]);
c2 = _addcarry_u64(c2, res[i], b[i], &res[i]);
}
Related, GCC does not support ADOX and ADCX at the moment. "At the moment" includes GCC 6.4 (Fedora 25) and GCC 7.1 (Fedora 26). GCC effectively disabled the intrinsics, but it still advertises support by defining __ADX__
in the preprocessor. Also see Issue 67317, Silly code generation for _addcarry_u32/_addcarry_u64. Many thanks to Xi Ruoyao for finding the issue.
According to Uros Bizjak on the GCC Help mailing list, GCC may never support the intrinsics. Also see GCC does not generate ADCX or ADOX for _addcarryx_u64.
Clang has its own set of issues with respect to ADOX and ADCX. Clang 3.9 and 4.0 crash when attempting to use them. Also see Issue 34249, Panic when using _addcarryx_u64 with Clang 3.9. According to Craig Topper, it should be fixed in Clang 5.0.
My apologies for posting the information under a MSVC question. This is one of the few hits when searching for information about using the intrinsics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With