Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC C vector extension: How to test the result of a comparison (for conditional assignment, etc)?

Background: GCC C's builtin vector extensions allow for a fairly natural representation of SIMD vectors as C "types." According to the documentation, many built-in operations are supported (+, -, etc). However, the ternary operator, as well as logical operators (&&, ||) for some reason only work in C++. This is an issue for an all=C codebase.

The question: In GCC C, how would one implement SIMD-compatible [branchless] conditionals of the form:

    v4si a = {2,-1,3,4}, b, indicesLessThan0;
    indicesLessThan0 = a < 0;
    b = indicesLessThan0 ? a : 0;

And, more generally, how to perform an arbitrary independent block of statements based on that same result:

v4si c = {9,8,7,6}, d;
for (int i = 0; i < 4; i++) {
  if (indicesLessThan0[i]) { // consider tests one by one
     b[i] = a[i] // as the ternary operator does above
     d[i] = c[i] + 1; // some other independent operation
  }
  else {
     b[i] = 0; // as the ternary operator does above
     d[i] = c[i] - 1; // another independent operation
  } 
}

If doing a block of statements is harder (SIMD branching is bad), it would be fine to perform the ternary test again for any additional statements at the cost (supposedly) of some efficiency:

d = indicesLessThan0 ? c + 1 : c - 1; // the other operation in the loop

But the ternary operator doesn't work in C for some reason the manual doesn't explain. Is there another easy way? Some way of using if statements?

like image 482
user1649948 Avatar asked Sep 27 '22 05:09

user1649948


1 Answers

I have found 3 solutions as a result of hitting the code with the kitchen sink.

  1. Switch to g++. Not too hard, and turns out most of the code can be swapped just by putting a (type *) before all the -allocs. Then I can just do:

    v16s8 condStor = test ? a : b;

  2. Even better, I discovered you can just bitbash using various mixes of &'s and |'s, the same way everyone does with bits inside of integers. The trick is that vectors set all truth to 11111111... (-1 unsigned), which makes values stick when using bitwise operators.

  3. Even better still, "type punning 101" with an intrinsic function:
    v16s8 condStor = b; __builtin_ia32_maskmovdqu (a, test, (char *)(&condStor));
    This takes advantage of the function dedicated to doing what #2 does in one fell swoop.

Not convinced? Check the assembly:

  1. pxor    %xmm1, %xmm1
    movdqa  -64(%rbp), %xmm0
    pcmpeqb %xmm1, %xmm0
    pcmpeqd %xmm1, %xmm1
    pandn   %xmm1, %xmm0
    pxor    %xmm1, %xmm1
    pcmpgtb %xmm0, %xmm1
    movdqa  %xmm1, %xmm0
    movdqa  -32(%rbp), %xmm2
    movdqa  -16(%rbp), %xmm1
    pand    %xmm0, %xmm1
    pandn   %xmm2, %xmm0
    por %xmm1, %xmm0
    movaps  %xmm0, -80(%rbp)
    
  2. movdqa  -64(%rbp), %xmm0
    movdqa  %xmm0, %xmm1
    pand    -16(%rbp), %xmm1
    pcmpeqd %xmm0, %xmm0
    pxor    -64(%rbp), %xmm0
    pand    -32(%rbp), %xmm0
    por %xmm1, %xmm0
    movaps  %xmm0, -80(%rbp)
    
  3. movdqa  -32(%rbp), %xmm0
    movaps  %xmm0, -80(%rbp)
    leaq    -80(%rbp), %rax
    movdqa  -16(%rbp), %xmm0
    movdqa  -64(%rbp), %xmm1
    movq    %rax, %rdi
    maskmovdqu  %xmm1, %xmm0
    

    Judging by how convoluted 1 appears to be, followed by 2, followed by 3, I now see the cost of the C++ abstraction. Maybe this is what Linus was ranting about back in the day. (No, probably not.) Anyway, hope this helps someone!

like image 143
user1649948 Avatar answered Oct 12 '22 23:10

user1649948