Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I evaluate a large number of boolean ANDs quickly?

I need to run millions of queries of the following kind.

Each input consists of a small collection (<100) of boolean vectors of various sizes (<20000 elements), with each having a few 1s and many 0s :

A = [ 0 0 0 1 0 0 0 0 0 0 0 ... ]
B = [ 0 0 0 0 1 0 ... ]
...

I also have many (>20000) boolean AND expressions. These expressions are constant for all queries.

S[1] = A[10] AND B[52] AND F[15] AND U[2]
S[2] = I[8] AND Z[4]
...

Each expression can reference zero or one element from each vector. Variables rarely appear in more than one expression. For each query, output is the set of true expressions.

What's a good algorithm to compute the queries quickly, preferably more quickly than evaluating each expression in order? The algorithm needs to run once for each input, and there are millions of inputs to run on, so speed is important. Since the expressions are constant, I can optimize them ahead of time. I'm working with C.

like image 473
moc Avatar asked Jul 28 '15 18:07

moc


3 Answers

Return early. As soon as you find a false boolean, you know that the and expression will return false, so don't check the rest.

In C, you get this behavior by default in hardcoded boolean expressions:

(A[10] && B[52] && F[15] && U[2])

Depending on how predictable the inputs are, you might be able to get a lot of performance out of ranking each expression's variables per likelihood of being false, and reordering the expression from most likely to less likely.

like image 150
Emilio M Bumachar Avatar answered Oct 23 '22 05:10

Emilio M Bumachar


You seem to be using lots of data. It's a guess, but I'd say you'll get optimal behavior by preprocessing your expressions into cache optimal passes. Consider the two expressions given:

S[1] = A[10] AND B[52] AND F[15] AND U[2]
S[2] = I[8] AND Z[4]

rewrite these as:

S[1] = 1;
S[1] &= A[10];
S[1] &= B[52];
S[1] &= F[15];
S[1] &= U[2];

S[2] = 1;
S[2] &= I[8];
S[2] &= Z[4];

Then sort all of the expressions together to create one long list of operations:

S[1] = 1;
S[2] = 1;

S[1] &= A[10];
S[1] &= B[52];
S[1] &= F[15];
S[2] &= I[8];
S[1] &= U[2];
S[2] &= Z[4];

Consider the size of the machine cache on hand. We want all of the input vectors in cache. That probably can't happen so we know we will be pulling the input vectors and the result vectors in and out of memory multiple times. We want to partition the available machine cache into three parts: input vector chunk, result vector chunk, and some working space (where our current list of operations will be pulled from).

Now, walk the list of expressions pulling out expressions that fall into the A-I and S[1]-S[400] range. Then walk again pulling J-T (or whatever fits in cache) and pull those operations next, once you get to the end of the operations list repeat for s[401]-s[800]. This is the final order of execution for the operations. Note that this can be parallelized without contention across the S bands.

The down side is that you do not get the early out behavior. The upside is you only have cache failures as you transition blocks of computation. For such a large data set I suspect this (and the elimination of all branching) will overwhelm the early out advantage.

If you still want to try to use the early out optimization you can it is just harder to implement. Consider: once you have your cache bracket A-I & S[1]-s[400], and you have created a list of operations across that bracket:

S[1] &= A[10];
S[1] &= B[52];
S[1] &= F[15];
S[2] &= I[8];

You can then reorder the operations to group them by S[x] (which this example already was). Now if you find A[10] is false you can "early out" to the S[2] block. As far as how to implement this? Well, your operations now need to know how many to skip forward from the current operation:

Operation[x  ] => (S[1] &= A[10], on false, x+=3)
Operation[x+1] => (S[1] &= B[52], on false, x+=2)
Operation[x+2] => (S[1] &= F[15], on false, x+=1)
Operation[x+3] => (S[2] &= I[8]...

Again, I suspect simply adding the branching in will be slower than just performing all of the other work. This is not a full early out process since the when you move to the next input block chunk you'll have to reinspect each S[x] value accessed to determine if it has already failed and should be skipped.

like image 4
Speed8ump Avatar answered Oct 23 '22 06:10

Speed8ump


  1. Convert your inputs to packed form (list of indexes for non-zero elements). To make whole approach quicker than evaluating each expression in order, you need to process several elements at once, using either compiler intrinsics of bit twiddling (assuming that each input boolean occupies just one byte, or even better one bit).
  2. Pre-process 'AND' expressions to arrays mapping indexes from packed input arrays to the expression to which it belongs. (But if some variable appears in more than one expression, it will require some special handling).
  3. Initialize counters for expressions to 0.
  4. Read packed input arrays and increment counters for corresponding expressions.
  5. Expressions having counter equal to their number of terms are 'true', others are 'false'.
like image 2
Evgeny Kluev Avatar answered Oct 23 '22 05:10

Evgeny Kluev