I am new to GCC's C vector extensions. According to the manual, the result of comparing one vector to another in the form (test = vec1 > vec2;) is that "test" contains a 0 in each element that is false and a -1 in each element that is true.
But how to very quickly check if ANY of the element comparisons was true? And, further, how to tell which is the first element for which the comparison was true?
For example, with:
vec1 = {1,1,3,1};
vec2 = {1,2,2,2};
test = vec1 > vec2;
I want to determine if "test" contains any truth (non-zero elements). In this case I want "test" to reduce to true, because there exists an element for which vec1 is greater than vec2 and hence an element in test containing -1.
Additionally, or alternatively, I want to quickly discover WHICH element fails the test. In this case, this would simply be the number 2. Said another way, I want to test which is the first non-zero element.
int hasAnyTruth = ...; // should be non-zero. "bool" works too since C99
int whichTrue = ...; // should contain 2, because test[2] == -1
I imagine we could use a simd reduction-addition command (?) to sum everything in the vector into a number and compare that sum to 0, but I don't know how (or if there is a faster way). I am guessing some form of argmax is necessary for the second question, but again, I don't know how to instruct GCC to use it on the vectors.
Clang's vector extension do a good job with the any
function.
#if defined(__clang__)
typedef int64_t vli __attribute__ ((ext_vector_type(VLI_SIZE)));
typedef double vdf __attribute__ ((ext_vector_type(VDF_SIZE)));
#else
typedef int32_t vsi __attribute__ ((vector_size (SIMD_SIZE)));
typedef int64_t vli __attribute__ ((vector_size (SIMD_SIZE)));
#endif
static bool any(vli const & x) {
for(int i=0; i<VLI_SIZE; i++) if(x[i]) return true;
return false;
}
Assembly
any(long __vector(4) const&): # @any(long __vector(4) const&)
vmovdqa ymm0, ymmword ptr [rdi]
vptest ymm0, ymm0
setne al
vzeroupper
ret
Although pmovmskb
might still be a better choice ptest
is still a huge improvement over what GCC does
any(long __vector(4) const&):
cmp QWORD PTR [rdi], 0
jne .L5
cmp QWORD PTR [rdi+8], 0
jne .L5
cmp QWORD PTR [rdi+16], 0
jne .L5
cmp QWORD PTR [rdi+24], 0
setne al
ret
.L5:
mov eax, 1
ret
GCC should fix this. Clang is not optimal for AVX512 though.
The any
function I would argue is a critical vector function so compilers should either provide a builtin like they do for shuffle (e.g. __builtin_shuffle
for GCC and __builtin_shufflevector
for clang) or the compiler should be smart enough to figure out the optimal code like Clang does at least for SSE and AVX but not AVX512.
From Mystical:
_mm_movemask_epi8()
It's more portable than GCC vector extensions. It's standardized by Intel, so it will work in every major compiler: GCC, Clang, MSVC, ICC, etc...
http://software.intel.com/sites/landingpage/IntrinsicsGuide
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With