Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient formula for unpacking 16-bit BCD? (e.g. 0x1234 to 0x01020304)

Is there a bit twiddling hack for efficiently unpacking a 16-bit packed BCD number?

Doing it the pedestrian way requires 10 operations (3 shifts, 4 ANDs and 3 ORs or ADDs):

x = (bcd & 0xF000) << 12
  | (bcd & 0x0F00) <<  8
  | (bcd & 0x00F0) <<  4
  | (bcd & 0x000F)

With multi-way ADD/OR the critical path length would be 3 but these operations tend to be binary and so most CPUs would be looking at a critical path of length 4.

Can this be done more efficiently?

Note: for some purposes it can be equally useful if some permutation of the nibbles can be unpacked especially efficiently, like if the word to be unpacked comes from a lookup table over whose creation I have full control (so that I can stick each digit wherever I want). The purpose of using packed instead of unpacked BCD in this case would be to halve the memory pressure and to avoid exceeding the size of the L1 cache, taking some load off an over-saturated memory subsystem by increasing the load on the CPU's ALUs.

For example, if I permute the digits like 0x1324 then a simple de-interleave yields 0x01020304:

x = ((bcd << 12) | bcd) & 0x0F0F0F0F

That's just three operations with critical path length 3, quite an improvement over the original version...

like image 537
DarthGizka Avatar asked Mar 02 '23 23:03

DarthGizka


1 Answers

Here is an alternative way, with fewer operations but a longer critical path, based on the binary decomposition of the move-distance of the nibbles (moving nibbles that move by 8 or 12 steps together by 8, moving nibbles that move a distance of 4 or 12 together by 4).

x = bcd
x = ((x & 0xFF00) << 8) | (x & 0xFF)
x = ((x & 0x00F000F0) << 4) | (x & 0x000F000F)

For example:

// start
0000ABCD
// move A and B by 8
00AB00CD
// move A and C by 4
0A0B0C0D
like image 90
harold Avatar answered Mar 05 '23 19:03

harold