Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC C vector extension: How to check if result of ANY element-wise comparison is true, and which?

I am new to GCC's C vector extensions. According to the manual, the result of comparing one vector to another in the form (test = vec1 > vec2;) is that "test" contains a 0 in each element that is false and a -1 in each element that is true.

But how to very quickly check if ANY of the element comparisons was true? And, further, how to tell which is the first element for which the comparison was true?

For example, with:

vec1 = {1,1,3,1};
vec2 = {1,2,2,2};
test = vec1 > vec2;

I want to determine if "test" contains any truth (non-zero elements). In this case I want "test" to reduce to true, because there exists an element for which vec1 is greater than vec2 and hence an element in test containing -1.

Additionally, or alternatively, I want to quickly discover WHICH element fails the test. In this case, this would simply be the number 2. Said another way, I want to test which is the first non-zero element.

int hasAnyTruth = ...; // should be non-zero. "bool" works too since C99
int whichTrue = ...; // should contain 2, because test[2] == -1

I imagine we could use a simd reduction-addition command (?) to sum everything in the vector into a number and compare that sum to 0, but I don't know how (or if there is a faster way). I am guessing some form of argmax is necessary for the second question, but again, I don't know how to instruct GCC to use it on the vectors.

like image 690
user1649948 Avatar asked Jul 23 '15 20:07

user1649948


2 Answers

Clang's vector extension do a good job with the any function.

#if defined(__clang__)
typedef int64_t vli __attribute__ ((ext_vector_type(VLI_SIZE)));
typedef double  vdf __attribute__ ((ext_vector_type(VDF_SIZE)));
#else
typedef int32_t vsi __attribute__ ((vector_size (SIMD_SIZE)));
typedef int64_t vli __attribute__ ((vector_size (SIMD_SIZE)));
#endif

static bool any(vli const & x) {
  for(int i=0; i<VLI_SIZE; i++) if(x[i]) return true;
  return false;
}

Assembly

any(long __vector(4) const&): # @any(long __vector(4) const&)
  vmovdqa ymm0, ymmword ptr [rdi]
  vptest ymm0, ymm0
  setne al
  vzeroupper
  ret

Although pmovmskb might still be a better choice ptest is still a huge improvement over what GCC does

any(long __vector(4) const&):
  cmp QWORD PTR [rdi], 0
  jne .L5
  cmp QWORD PTR [rdi+8], 0
  jne .L5
  cmp QWORD PTR [rdi+16], 0
  jne .L5
  cmp QWORD PTR [rdi+24], 0
  setne al
  ret
.L5:
  mov eax, 1
  ret

GCC should fix this. Clang is not optimal for AVX512 though.

The any function I would argue is a critical vector function so compilers should either provide a builtin like they do for shuffle (e.g. __builtin_shuffle for GCC and __builtin_shufflevector for clang) or the compiler should be smart enough to figure out the optimal code like Clang does at least for SSE and AVX but not AVX512.

like image 117
Z boson Avatar answered Oct 17 '22 11:10

Z boson


From Mystical:

_mm_movemask_epi8()

It's more portable than GCC vector extensions. It's standardized by Intel, so it will work in every major compiler: GCC, Clang, MSVC, ICC, etc...

http://software.intel.com/sites/landingpage/IntrinsicsGuide

like image 1
user4842163 Avatar answered Oct 17 '22 10:10

user4842163