Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Question

I have an unsigned 32-bit integer value that contains multiple groups of bits. All bits within one group are always the same. Each group has the same size n (from 1 to 32 and is unknown at compile time). There are no gaps between the groups.

I need to convert each group of these bits into one bit. Next, I need to pack these bits as depicted below:

Bit Packing

I have come up with the following C function that iterates over these bits and packs them for an arbitrary n between 1 and 32.

uint32_t PackBits(uint32_t value, uint8_t n)
{
    if (n<2) return value;

    uint32_t res = 0;
    uint32_t mask = 1;  // Also works for: mask= 1 << n-1; mask= 1 << n-2; mask= 1 << n-3; ... ; mask= 1 << n-n; 
    uint8_t s =  0;

    do {        
        res |= (value & mask) >> s;
        mask <<= n;
        s += n-1;
    } while (mask);

    return res;
}

Q: Is there a way to accomplish this goal in C without looping over these bits, with some clever sequence of multiplications, De Bruijn¹, carryless additions, xor, negations, etc... ?

For example, by taking advantage of the following bit-gathering property of multiplication, where d,c,b,a denote the bits representatively chosen from each bit group of the input value:

     byte 3  byte 2  byte 1  byte 0
  ├──────┤├──────┤├──────┤├──────┤
  ddddddddccccccccbbbbbbbbaaaaaaaa  (input value)
& 00000001100000000000000110000000  (bitwise "and" mask 0x1800180 for extracting the chosen bits)
  ────────────────────────────────
= 0000000dc00000000000000ba0000000  (chosen bits generated by the bitwise "and" operation)
× 00000000000000000100000000000001  (a constant magic multiplier 0x4001. Works only for n==8)
  ────────────────────────────────
= 0000000dcba000000000000ba0000000  (the product modulo 2^32)

Note that the "product modulo 2^32" can be right shifted by 21 to yield the correct output value

Similar bit-gathering operations can be made with different constant multipliers for n between 6 and 31.

Maybe a similar bit-gathering operations can be made with division, too.

Footnote 1: De Bruijn sequences as perfect hash functions:

For n==2, the operation (value*0x2AAB & 0x7FFF8000) >> 15 generates a UNIQUE 16-bit number which could index a 64K element LUT, or have something done to map it to the desired bits. 64K of uint16_ts would fit in L2 cache on modern CPUs, but would only be reasonable if we were doing this for a lot of values in a row so it would pay for the demand misses to get data into L2 / L1d cache in the first place, and pay for the later cache misses from stuff that wouldn't have been evicted by other strategies that run more instructions but avoid a LUT.

For n==3, (value*0x40024B7 & 0x0FFC0000) >> 18 generates a UNIQUE 10-bit value (which can index a 1024 element LUT).
For n==4, (value*0x22AAF & 0x1FE00000) >> 21 generates a UNIQUE 8-bit value (which can index a 256 element LUT). A 256 element LUT is actually reasonable, especially since its entries are only uint8_t in size.
For n==5, (value*0x84ADF & 0x03F00000) >> 20 generates a UNIQUE 6-bit value (which can index a 64 element LUT). A 64 element LUT is actually reasonable, especially since its entries are only uint8_t in size.

(These numbers haven't been checked carefully for uniqueness)

0___________ · Accepted Answer

Probably it will be more efficient than looping as loops are not pipeline-friendly.

#define MASK(n, x) ((1U << (n + 1)) - 1) << ((x) * (n))
#define BITVAL(val, n , x) ((((val) & MASK(n,x)) == MASK(n,x))  << (x))
#define PACKBITS4(val) (BITVAL(val, 4, 0) + BITVAL(val, 4, 1) + BITVAL(val, 4, 2) + BITVAL(val, 4, 3))
#define PACKBITS5(val) (BITVAL(val, 5, 0) + BITVAL(val, 5, 1) + BITVAL(val, 5, 2))

it is for 16bits integers but you can hardcode all possible ns. When n is a constant expression it will generate code with no jumps, if not it will jump but not loop

https://godbolt.org/z/nzqYsee3Y

It can be potentially faster but it has to be tested.

Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Tags:

c

bit-manipulation

micro-optimization

bit-packing

George Robinson

1 Answers

0___________

Recent Activity

Donate For Us

Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Tags:

c

bit-manipulation

micro-optimization

bit-packing

George Robinson

1 Answers

0___________

Related questions

Recent Activity

Donate For Us