Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to find a intersection between two sets of numbers, one defined by a bitwise condition and another by an arithmetic condition

This is probably well covered ground, but I'm ignorant on the subject so I'll be using amateurish terminology. Let's say I'm messing around with some set of conditions that each define a non-zero set of numbers within an int, let's just say 8-bit int. So for a bitwise one, I may have this:

11XX00XX

Saying that I want all bytes that have 1s where there are 1s, 0s where there are 0s, and don't care about the Xs. So 11110000 or 11000010 fulfills this, but 01110000 does not. Easy enough, right? For arithmetic conditions, I can only imagine there being some use of ==, !=, >=, >, <=, or < with comparison with a constant number. So I may say:

X > 16

So any number greater than 16 (00010000). What if I want to find all numbers that are in both of those above example sets? I can tell by looking at it that any numbers ending in 100XX will fit the requirements, so the bitwise part of the interseection includes 11X100XX. Then I have to include the region 111X00XX to fill the rest of the range above it, too. Right? Although I think for other cases, it wouldn't turn out so neatly, correct? Anyway, what is the algorithm behind this, for any of those arithmetic conditions vs any possible of those bitwise ones. Surely there must be a general case!

Assuming there is one, and it's probably something obvious, what if things get more complicated? What if my bitwise requirement becomes:

11AA00XX

Where anything marked with A must be the same. So 110000XX or 111100XX, but not 111000XX. For any number of same bit "types" (A, B, C, etc) in any number and at any positions, what is the optimal way of solving the intersection with some arithmetic comparison? Is there one?

I'm considering these bitwise operations to be sort of a single comparison operation/branch, just like how the arithmetic is setup. So maybe one is all the constants that, when some byte B 01110000 is ANDed with them, result in 00110000. So that region of constants, which is what my "bitwise condition" would be, would be X011XXXX, since X011XXXX AND 01110000 = 00110000. All of my "bitwise conditions" are formed by that reversal of an op like AND, OR, and XOR. Not sure if something like NAND would be included or not. This may limit what conditions are actually possible, maybe? If it does, then I don't care about those types of conditions.

Sorry for the meandering attempt at an explanation. Is there a name for what I'm doing? It seems like it'd be well tread ground in CS, so a name could lead me to some nice reading on the subject. But I'm mostly just looking for a good algorithm to solve this. I am going to end up with using more than 2 things in the intersection (potentially dozens or many many more), so a way to solve it that scales well would be a huge plus.

like image 467
user173342 Avatar asked May 04 '12 17:05

user173342


People also ask

Is bitwise faster than arithmetic?

Yes, Bitwise operations are alot faster than any arithmetic operations because these operations are performed directly on the bits that is 0 and 1.

How do you find the bitwise of two numbers?

The bitwise OR of two numbers is just the sum of those two numbers if there is no carry involved, otherwise you just add their bitwise AND. Let's say, we have a=5(101) and b=2(010), since there is no carry involved, their sum is just a|b.

How do you know if two numbers are equal using bitwise operators?

If two numbers are equal the bitwise XOR (^) result will be zero.

How can we find sum of two numbers using bitwise operators?

If x and y don't have set bits at same position(s), then bitwise XOR (^) of x and y gives the sum of x and y. To incorporate common set bits also, bitwise AND (&) is used. Bitwise AND of x and y gives all carry bits. We calculate (x & y) << 1 and add it to x ^ y to get the required result.


2 Answers

Extending a bit on saluce's answer:

Bit testing

You can build test patterns, so you don't need to check each bit individually (testing the whole number is faster than testing one-bit-a-time, especially that the on-bit-a-time test the whole number just as well):

testOnes = 128 & 64 // the bits we want to be 1
testZeros = ~(8 & 4) // the bits we want to be 0, inverted

Then perform your test this way:

if (!(~(x & testOnes) & testOnes) &&
    !(~(~x | testZeros))) {
  /* match! */
}

Logic explained:

First of all, in both testOnes and testZeros you have the bits-in-interest set to 1, the rest to 0.

testOnes testZeros
11000000 11110011

Then x & testOnes will zero out all bits we don't want to test for being ones (note the difference between & and &&: & performs the logical AND operation bitwise, whereas && is the logical AND on the whole number).

testOnes
11000000
x        x & testOnes
11110000 11000000
11000010 11000000
01010100 01000000

Now at most the bits we are testing for being 1 can be 1, but we don't know if all of them are 1s: by inverting the result (~(x & testOnes)), we get all numbers we don't care about being 1s and the bits we would like to test are either 0 (if they were 1) or 1 (if they were 0).

testOnes
11000000
x        ~(x & testOnes)
11110000 00111111
11000010 00111111
01010100 10111111

By bitwise-AND-ing it with testOnes we get 0 if the bits-in-interest were all 1s in x, and non-zero otherwise.

testOnes
11000000
x        ~(x & testOnes) & testOnes
11110000 00000000
11000010 00000000
01010100 10000000

At this point we have: 0 if all bits we wanted to test for 1 were actually 1s, and non-0 otherwise, so we perform a logical NOT to turn the 0 into true and the non-0 into false.

x        !(~(x & testOnes) & testOnes)
11110000 true
11000010 true
01010100 false

The test for zero-bits is similar, but we need to use bitwise-OR (|), instead of bitwise-AND (&). First, we flip x, so the should-be-0 bits become should-be-1, then the OR-ing turns all non-interest bits into 1, while keeping the rest; so at this point we have all-1s if the should-be 0 bits in x were indeed 0, and non-all-1s, otherwise, so we flip the result again to get 0 in the first case and non-0 in the second. Then we apply logical NOT (!) to convert the result to true (first case) or false (second case).

testZeros
11110011
x        ~x       ~x | testZeros ~(~x | testZeros) !(~(~x | testZeros))
11110000 00001111 11111111       00000000          true
11000010 00111101 11111111       00000000          true
01010100 10101011 11111011       00000100          false

Note: You need to realize that we have performed 4 operations for each test, so 8 total. Depending on the number of bits you want to test, this might still be less than checking each bit individually.

Arithmetic testing

Testing for equality/difference is easy: XOR the number with the tested one -- you get 0 if all bits were equal (thus the numbers were equal), non-0 if at least one bit was different (thus the numbers were different). (Applying NOT turns the equal test result true, differences to false.)

To test for unequality, however, you are out of luck most of the time, at least as it applies to logical operations. You are correct that checking for powers-of-2 (e.g. 16 in your question), can be done with logical operations (bitwise-AND and test for 0), but for numbers that are not powers-of-2, this is not so easy. As an example, let's test for x>5: the pattern is 00000101, so how do you test? The number is greater if it has a 1 in the fist 5 most-significant-bits, but 6 (00000110) is also larger with all first five bits being 0.

The best you could do is test if the number is at least twice as large as the highest power-of-2 in the number (4 for 5 in the above example). If yes, then it is larger than the original; otherwise, it has to be at least as much as the highest power-of-2 in the number, and then perform the same test on the less-significant bits. As you can see, the number of operations are dynamic based on the number of 1 bits in the test number.

Linked bits

Here, XOR is your friend: for two bits XOR yields 0 if they are the same, and 1 otherwise.

I do not know of a simple way to perform the test, but the following algorithm should help:

Assume you need bits b1, ..., bn to be the same (all 1s or all 0s), then zero-out all other bits (see logical-AND above), then isolate each bit in the test pattern, then line them up at the same position (let's make it the least-significant-bit for convenience). Then XOR-ing two of them then XOR-ing the third with the result, etc. will yield an even number at every odd step, odd number at every even step if the bits were the same in the original number. You will need to test at every step as testing only the final result can be incorrect for a larger number of linked-bits-to-be-tested.

testLinks
00110000
x        x & testLinks
11110000 00110000
11000010 00000000
01010100 00010000

x        x's bits isolated isolated bits shifted
11110000 00100000          00000001
         00010000          00000001
11000010 00000000          00000000
         00000000          00000000
01010100 00000000          00000000
         00010000          00000001

x        x's bits XOR'd result
11110000 00000000       true (1st round, even result)
11000010 00000000       true (1st round, even result)
01010100 00000001       false (1st round, odd result)

Note: In C-like languages the XOR operator is ^.

Note: How to line bits to the same position? Bit-shifting. Shift-left (<<) shifts all bits to the left, "losing" the most significant-bit and "introducing" 0 to the least-significant-bit, essentially multiplying the number by 2; shift-right (>>) operates similarly, shifting bits to the right, essentially integer-dividing the number by 2, however, it "introduces" the same bit to the most-significant-bit that was there already (thus keeping negative numbers negative).

like image 195
Attila Avatar answered Sep 21 '22 14:09

Attila


Bitwise

Ok, so we look at bitwise operations, as that is the most efficient way of doing what you want. For clarity (and reference), bitwise values converted to decimal are

00000001 =   1
00000010 =   2
00000100 =   4
00001000 =   8
00010000 =  16
00100000 =  32
01000000 =  64
10000000 = 128

Now, given a bit pattern of 11XX00XX on the variable x we would perform the following checks:

x AND 128 == true
x AND 64  == true
x AND 8   == false
x AND 4   == false

If all those conditions are true, then the value matches the pattern. Essentially, you are checking the following conditions:

1XXXXXXX AND 10000000 == true
X1XXXXXX AND 01000000 == true
XXXX0XXX AND 00001000 == false
XXXXX0XX AND 00000100 == false

To put that together in programming language parlance (I'll use C#), you'd look for

if ((x && 128) && (x && 64) && !(x && 8) && !(x && 4))  
{
    // We have a match
}

For the more complicated bit pattern of 11AA00XX, you would add the following condition:

NOT ((x AND 32) XOR (x AND 16)) == true

What this does is first check x AND 32, returning either a 0 or 1 based on the value of that bit in x. Then, it makes the same check on the other bit, x AND 16. The XOR operation checks for the difference in bits, returning a 1 if the bits are DIFFERENT and a 0 if the bits are the same. From there, since we want to return a 1 if they are the same, we NOT the whole clause. This will return a 1 if the bits are the same.


Arithmetically

On the arithmetic side, you'd look at using a combination of division and modulus operations to isolate the bit in question. To work with division, you start by finding the highest possible power of two that the number can be divided by. In other words, if you have x=65, the highest power of two is 64.

Once you've done the division, you then use modulus to take the remainder after division. As in the example above, given x=65, x MOD 64 == 1. With that number, you repeat what you did before, finding the highest power of two, and continuing until the modulus returns 0.

like image 29
saluce Avatar answered Sep 18 '22 14:09

saluce