Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AVX2: BitScanReverse or CountLeadingZeros on 8 bit elements in AVX register

I would like to extract the index of the highest set bit in a 256 bit AVX register with 8 bit elements. I could neither find a bsr nor a clz implementation for this.

For clz with 32 bit elements, there is the bithack with a float conversion, but that is probably not possible for 8 bits.

Currently, I am working on a solution, where I check the bits one by one, which I will add later, but I wonder if there is a faster way to do this.

like image 589
simonlet Avatar asked Dec 31 '22 13:12

simonlet


1 Answers

Here is a vpshufb based solution. The idea is to split the input into two halves, make a lookup on both and combine the results:

__m256i clz_epu8(__m256i values)
{
    // extract upper nibble:
    __m256i hi = _mm256_and_si256(_mm256_srli_epi16(values, 4), _mm256_set1_epi8(0xf));
    // this sets the highest bit for values >= 0x10 and otherwise keeps the lower nibble unmodified:
    __m256i lo = _mm256_adds_epu8(values, _mm256_set1_epi8(0x70));

    // lookup tables for count-leading-zeros (replace this by _mm256_setr_epi8, if this does not get optimized away)
    // ideally, this should compile to vbroadcastf128 ...
    const __m256i lookup_hi = _mm256_broadcastsi128_si256(_mm_setr_epi8(0, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0));
    const __m256i lookup_lo = _mm256_broadcastsi128_si256(_mm_setr_epi8(8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4));

    // look up each half
    __m256i clz_hi = _mm256_shuffle_epi8(lookup_hi, hi);
    __m256i clz_lo = _mm256_shuffle_epi8(lookup_lo, lo);

    // combine results (addition or xor would work as well)
    return _mm256_or_si256(clz_hi, clz_lo);
}

godbolt-link with crude test: https://godbolt.org/z/MYq74Wxdh

like image 175
chtz Avatar answered Jan 24 '23 09:01

chtz