Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compact a hex number

Is there a clever (ie: branchless) way to "compact" a hex number. Basically move all the 0s all to one side?

eg:

0x10302040 -> 0x13240000

or

0x10302040 -> 0x00001324

I looked on Bit Twiddling Hacks but didn't see anything.

It's for a SSE numerical pivoting algorithm. I need to remove any pivots that become 0. I can use _mm_cmpgt_ps to find good pivots, _mm_movemask_ps to convert that in to a mask, and then bit hacks to get something like the above. The hex value gets munged in to a mask for a _mm_shuffle_ps instruction to perform a permutation on the SSE 128 bit register.

like image 653
Jay Lemmon Avatar asked Jul 02 '14 19:07

Jay Lemmon


2 Answers

To compute mask for _pext:

mask = arg;
mask |= (mask << 1) & 0xAAAAAAAA | (mask >> 1) & 0x55555555;
mask |= (mask << 2) & 0xCCCCCCCC | (mask >> 2) & 0x33333333;

First do bit-or on pairs of bits, then on quads. Masks prevent shifted values from overflowing to other digits.

After computing mask this way or harold's way (which is probably faster) you don't need the full power of _pext, so if targeted hardware doesn't support it you can replace it with this:

for(int i = 0; i < 7; i++) {
    stay_mask = mask & (~mask - 1);
    arg = arg & stay_mask | (arg >> 4) & ~stay_mask;
    mask = stay_mask | (mask >> 4);
}

Each iteration moves all nibbles one digit to the right if there is some space. stay_mask marks bits that are in their final positions. This uses somewhat less operations than Hacker's Delight solution, but might still benefit from branching.

like image 190
zch Avatar answered Sep 21 '22 12:09

zch


Supposing we can use _pext_u32, the issue then is computing a mask that has an F for every nibble that isn't zero. I'm not sure what the best approach is, but you can compute the OR of the 4 bits of the nibble and then "spread" it back out to F's like this:

// calculate horizontal OR of every nibble
x |= x >> 1;
x |= x >> 2;
// clean up junk
x &= 0x11111111;
// spread
x *= 0xF;

Then use that as the mask of _pext_u32.

_pext_u32 can be emulated by this (taken from Hacker's Delight, figure 7.6)

unsigned compress(unsigned x, unsigned m) {
   unsigned mk, mp, mv, t; 
   int i; 

   x = x & m;           // Clear irrelevant bits. 
   mk = ~m << 1;        // We will count 0's to right. 

   for (i = 0; i < 5; i++) {
      mp = mk ^ (mk << 1);             // Parallel prefix. 
      mp = mp ^ (mp << 2); 
      mp = mp ^ (mp << 4); 
      mp = mp ^ (mp << 8); 
      mp = mp ^ (mp << 16); 
      mv = mp & m;                     // Bits to move. 
      m = m ^ mv | (mv >> (1 << i));   // Compress m. 
      t = x & mv; 
      x = x ^ t | (t >> (1 << i));     // Compress x. 
      mk = mk & ~mp; 
   } 
   return x; 
}

But that's a bit of a disaster. It's probably better to just resort to branching code then.

like image 34
harold Avatar answered Sep 19 '22 12:09

harold