Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all the 2-bit values that match against another binary pattern and then sum them

First Value:

I have a binary value which is actually a compact series of 2-bit values. (That is, each 2 bits in the binary value represents 0, 1, 2, or 3.) So, for example, 0, 3, 1, 2 becomes 00110110. In this binary string, all I care about are the 3's (or alternately, I could flip the bits and only care about the 0's, if that makes your answer easier). All the other numbers are irrelevant (for reasons we'll get into in a bit).

Second Value:

I have a second binary value which is also a compacted series of 2-bit values represented the same way. It has an identical length to the First Value.

Math:

I want the sum of the 2-bit numbers in the Second Value that have the same position as a 3 from the First Value. In other words, if I have:

First:  11000011
Second: 01111101

Then my answer would be "2" (I added the first number and the last number from "Second" together, because those were the only ones that had a "11" in the First Value that matched them.)

I want to do this in as few clock cycles as possible (either on a GPU or on an x86 architecture). However, I'm generally looking for an algorithm, not an assembler solution. Is there any way faster than masking off two bits at a time from each number and running several loops?

like image 449
Max Kanat-Alexander Avatar asked May 12 '12 09:05

Max Kanat-Alexander


1 Answers

Sure.

 // the two numbers
 unsigned int a;
 unsigned int b;

Now create a mask from a that contains '1' bit at an odd position only if in a there was '11' ending at same position.

 unsigned int mask = a & (a >> 1) & 0x55555555;

Expand it to get the '11' pattern back:

 mask = mask | (mask << 1);

So now if a was 1101100011, mask is 1100000011.

Then mask b with the mask:

 b = b & mask;

You can then perform the addition of (masked) numbers from b in parallel:

 b = (b & 0x33333333) + ((b & 0xcccccccc) >> 2);
 b = (b & 0x0f0f0f0f) + ((b & 0xf0f0f0f0) >> 4);
 b = (b & 0x00ff00ff) + ((b & 0xff00ff00) >> 8);
 b = (b & 0x0000ffff) + ((b & 0xffff0000) >> 16);

For a 32-bit number, the sum is now at the lowest bits of b. This is a commonly known pattern for parallel addition of bit fields. For larger than 32-bit numbers, you would add one more round for 64-bit numbers and two rounds for 128-bit numbers.

like image 60
Antti Huima Avatar answered Sep 23 '22 06:09

Antti Huima