Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Branchless way to map 0 to any nonzero value while leaving other values alone?

I am trying to find the the fastest possible implementation of this function:

uint16_t forbid_zero(uint16_t x)
{
    if(x == 0)
        return SOMETHING_NONZERO;
    return x;
}

It doesn't matter what SOMETHING_NONZERO is, as long as it's not zero. Any value other than zero should pass through unmodified. What's the fastest bit hackery for doing this? I assume there's some nice branchless way.

For context I have an algorithm in my critical path where zero as an input value will trigger infinite looping and other bad behavior, and I'm curious if it's possible for me to massage the input to always be nonzero without branching to check for 0. The consequence of passing an incorrect nonzero value to the algorithm is not nearly as bad; the mistake will be caught by checks that already have to exist at other layers, so mapping 0 to any other value is good enough.

like image 644
Joseph Garvin Avatar asked Oct 26 '25 09:10

Joseph Garvin


2 Answers

There are several methods.

  1. Direct.
unsigned is_all_zeros(uint16_t x)
{
  return x ? x : 0;
}
  1. Negate, invert, and shift.
unsigned is_all_zeros(uint16_t x)
{
  uint32_t y = -x; // Zero: MSB=0, non-zeros: MSB=1
  y = ~y;          // Zero: MSB=1, non-zeros: MSB=0
  return y >> 31;  // Zero: 1, non-zeros: 0
}
  1. Invert, popcnt, and shift.
unsigned is_all_zeros(uint16_t x)
{
  x = -x;  // Zero: 16 '1's, non-zeros: 15 or less '1's
  unsigned c = __builtin_popcount(x);  // Zero: 16, non-zeros: 15 or less
  return c >> 4; // Zero: 1, non-zeros: 0
}
  1. Invert, inc, and shift.
unsigned is_all_zeros(uint16_t x)
{
  x = -x;  // Zero: 0xffff, non-zeros: less than 0xffff
  unsigned c = (unsigned)(x) + 1;  // Zero: 0x10000, non-zeros: 0xffff or less
  return c >> 4; // Zero: 1, non-zeros: 0
}
like image 85
Fuzzier Avatar answered Oct 28 '25 23:10

Fuzzier


One possible implementation is:

uint16_t forbid_zero(uint16_t x)
{
    return x | !x;
}

Which the Compile Explorer shows x86-64 gcc 8.2 compiles to:

forbid_zero(unsigned short):
        xorl    %eax, %eax
        testw   %di, %di
        sete    %al
        orl     %edi, %eax
        ret

However even the implementation you give in the question compiles to branchless code using a conditional move instruction with the same compiler:

forbid_zero(unsigned short):
        testw   %di, %di
        movl    $1, %eax
        cmovne  %edi, %eax
        ret

...and of course there's no guarantee that !x won't be compiled to a branch, either.

like image 43
caf Avatar answered Oct 28 '25 23:10

caf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!