First Value:
I have a binary value which is actually a compact series of 2-bit values. (That is, each 2 bits in the binary value represents 0, 1, 2, or 3.) So, for example, 0, 3, 1, 2 becomes 00110110. In this binary string, all I care about are the 3's (or alternately, I could flip the bits and only care about the 0's, if that makes your answer easier). All the other numbers are irrelevant (for reasons we'll get into in a bit).
Second Value:
I have a second binary value which is also a compacted series of 2-bit values represented the same way. It has an identical length to the First Value.
Math:
I want the sum of the 2-bit numbers in the Second Value that have the same position as a 3 from the First Value. In other words, if I have:
First: 11000011
Second: 01111101
Then my answer would be "2" (I added the first number and the last number from "Second" together, because those were the only ones that had a "11" in the First Value that matched them.)
I want to do this in as few clock cycles as possible (either on a GPU or on an x86 architecture). However, I'm generally looking for an algorithm, not an assembler solution. Is there any way faster than masking off two bits at a time from each number and running several loops?
Sure.
// the two numbers
unsigned int a;
unsigned int b;
Now create a mask from a that contains '1' bit at an odd position only if in a there was '11' ending at same position.
unsigned int mask = a & (a >> 1) & 0x55555555;
Expand it to get the '11' pattern back:
mask = mask | (mask << 1);
So now if a was 1101100011, mask is 1100000011.
Then mask b with the mask:
b = b & mask;
You can then perform the addition of (masked) numbers from b in parallel:
b = (b & 0x33333333) + ((b & 0xcccccccc) >> 2);
b = (b & 0x0f0f0f0f) + ((b & 0xf0f0f0f0) >> 4);
b = (b & 0x00ff00ff) + ((b & 0xff00ff00) >> 8);
b = (b & 0x0000ffff) + ((b & 0xffff0000) >> 16);
For a 32-bit number, the sum is now at the lowest bits of b. This is a commonly known pattern for parallel addition of bit fields. For larger than 32-bit numbers, you would add one more round for 64-bit numbers and two rounds for 128-bit numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With