Consider the following small search function:
template <uint32_t N>
int32_t countsearch(const uint32_t *base, uint32_t needle) {
uint32_t count = 0;
#pragma clang loop vectorize(disable)
for (const uint32_t *probe = base; probe < base + N; probe++) {
if (*probe < needle)
count++;
}
return count;
}
At -O2
or higher, clang vectorizes this search, e.g,. resulting in code like this (for 10 elements):
int countsearch<10u>(unsigned int const*, unsigned int): # @int countsearch<10u>(unsigned int const*, unsigned int)
vmovd xmm0, esi
vpbroadcastd ymm0, xmm0
vpbroadcastd ymm1, dword ptr [rip + .LCPI0_0] # ymm1 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
vpxor ymm2, ymm1, ymmword ptr [rdi]
vpxor ymm0, ymm0, ymm1
vpcmpgtd ymm0, ymm0, ymm2
cmp dword ptr [rdi + 32], esi
vpsrld ymm1, ymm0, 31
vextracti128 xmm1, ymm1, 1
vpsubd ymm0, ymm1, ymm0
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpaddd ymm0, ymm0, ymm1
vphaddd ymm0, ymm0, ymm0
vmovd eax, xmm0
adc eax, 0
cmp dword ptr [rdi + 36], esi
adc eax, 0
vzeroupper
ret
How can I disable this vectorization on the command line or using a #pragma
in the code?
I tried the following command line arguments, none of which prevented the vectorization:
-disable-loop-vectorization
-disable-vectorization
-fno-vectorize
-fno-tree-vectorize
I also tried #pragma clang loop vectorize(disable)
above the loop as you seen in the code above, without luck.
GCC Autovectorization flagsGCC is an advanced compiler, and with the optimization flags -O3 or -ftree-vectorize the compiler will search for loop vectorizations (remember to specify the -mavx flag too). The source code remains the same, but the compiled code by GCC is completely different.
Vectorization, in simple words, means optimizing the algorithm so that it can utilize SIMD instructions in the processors. AVX, AVX2 and AVX512 are the instruction sets (intel) that perform same operation on multiple data in one instruction. for eg. AVX512 means you can operate on 16 integer values(4 bytes) at a time.
Loop vectorization transforms procedural loops by assigning a processing unit to each pair of operands. Programs spend most of their time within such loops. Therefore, vectorization can significantly accelerate them, especially over large data sets.
Vectorization is the use of vector instructions to speed up program execution. Vectorization can be done both by programmers by explicitly writing vector instructions and by a compiler. The latter case is called Auto Vectorization .
Turn off SLP Vectorization:
clang++ -O2 -fno-slp-vectorize
Godbolt Link
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With