Optimizing boolean logic tree evaluation

Question

I've got a lot of true/false results saved as bits in long[] arrays. I do have a huge number of these (millions and millions of longs).

For example, say I have only five results, I'd have:

+----- condition 5 is true
|
|+---- condition 4 is false
||
||+--- condition 3 is true
|||
|||+-- condition 2 is true
||||
||||+- condition 1 is false
10110

I also do have a few trees representing statements like:

condition1 AND (condition2 OR (condition3 AND condition 4))

The trees are very simple but very long. They basically look like this (it's an oversimplification below, just to show what I've got):

class Node {    
    int operator();
    List<Node> nodes;
    int conditionNumber();    
}

Basically either the Node is a leaf and then has a condition number (matching one of the bit in the long[] arrays) or the Node is not a leaf and hence refers several subnodes.

They're simple yet they allow to express complicated boolean expressions. It works great.

So far so good, everything is working great. However I do have a problem: I need to evaluate a LOT of expressions, determining if they're true or false. Basically I need to do some brute-force computation for a problem for which there's no know better solution than brute-forcing.

So I need to walk the tree and answer either true or false depending on the content of the tree and the content of the long[].

The method I need to optimize looks like this:

boolean solve( Node node, long[] trueorfalse ) {
   ...
}

where on the first call the node is the root node and then, obviously, subnodes (being recursive, that solve method calls itself).

Knowing that I'll only have a few trees (maybe up to a hundred or so) but millions and millions of long[] to check, what steps can I take to optimize this?

The obvious recursive solution passes parameters (the (sub)tree and the long[], I could get rid of the long[] by not passing it as a parameter) and is quite slow with all the recursive calls etc. I need to check which operator is used (AND or OR or NOT etc.) and there's quite a lot of if/else or switch statements involved.

I'm not looking for another algorithm (there aren't) so I'm not looking for going from O(x) to O(y) where y would be smaller than x.

What I'm looking for is "times x" speedup: if I can write code performing 5x faster, then I'll have a 5x speedup and that's it and I'd be very happy with it.

The only enhancement I see as of now --and I think it would be a huge "times x" speedup compared to what I have now-- would be to generate bytecode for every tree and have the logic for every tree hardcoded into a class. It should work well because I'll only ever have a hundred or so trees (but the trees aren't fixed: I cannot know in advance how the trees are going to look like, otherwise it would be trivial to simply hardcode manually every tree).

Any idea besides generating bytecode for every tree?

Now if I want to try the bytecode generation route, how should I go about it?

sehe · Accepted Answer

In order to maximize the opportunities for shortcut evaluation, you need to do your own branch prediction.

You might want to profile it, tallying

which AND branches evaluate into false
which OR branches result into true

You can then reorder the tree relative to the weights that you found in the profiling step. If you want/need to be particularly nifty, you can devise a mechanism that detects the weighting for a certain dataset during runtime, so you can reorder the branches on the fly.

Note that in the latter case, it might be advisable to not reorder the actual tree (with respect to storage efficiency and correctness of result while still executing), but rather devise a tree-node visitor (traversal algorithm) that is able to locally sort the branches according to the 'live' weights.

I hope all of this makes sense, because I realize the prose version is dense. However, like Fermat said, the code example is too big to fit into this margin :)

phkahler · Answer

There is a simple and fast way to evaluate boolean operations like this in C. Assuming you want to evaluate z=(x op y) you can do this:

 z = result[op+x+(y<<1)];

So op will be a multiple of 4 to select your operation AND, OR, XOR, etc. you create a lookup table for all possible answers. If this table is small enough, you can encode it into a single value and use right shift and a mask to select the output bit:

z = (MAGIC_NUMBER >> (op+x+(y<<1))) & 1;

That would be the fastest way to evaluate large numbers of these. Of course you'll have to split operations with multiple inputs into trees where each node has only 2 inputs. There is no easy way to short circuit this however. You can convert the tree into a list where each item contains the operation number and pointers to the 2 inputs and output. Once in list form, you can use a single loop to blow through that one line a million times very quickly.

For small trees, this is a win. For larger trees with short circuiting it's probably not a win because the average number of branches that need to be evaluated goes from 2 to 1.5 which is a huge win for large trees. YMMV.

EDIT: On second thought, you can use something like a skip-list to implement short circuiting. Each operation (node) would include a compare value and a skip-count. if the result matched the compare value, you can bypass the next skip-count values. So the list would be created from a depth-first traversal of the tree, and the first child would include a skip count equal to the size of the other child. This takes a bit more complexity to each node evaluation but allows short circuiting. Careful implementation could do it without any condition checking (think 1 or 0 times the skip-count).

Mike Dunlavey · Answer

I think your byte-coding idea is the right direction. What I would do, regardless of language, is write a precompiler. It would walk each tree, and use print statements to translate it into source code, such as.

((word&1) && ((word&2) || ((word&4) && (word&8))))

That can be compiled on the fly whenever the trees change, and the resulting byte code / dll loaded, all of which takes under a second.

The thing is, at present you are interpreting the contents of the trees. Turning them into compiled code should make them run 10-100 times faster.

ADDED in response to your comments on not having the JDK. Then, if you can't generate Java byte code, I would try to write my own byte-code interpreter than would run as fast as possible. It could look something like this:

while(iop < nop){
  switch(code[iop++]){
    case BIT1: // check the 1 bit and stack a boolean
      stack[nstack++] = ((word & 1) != 0);
      break;
    case BIT2: // check the 2 bit and stack a boolean
      stack[nstack++] = ((word & 2) != 0);
      break;
    case BIT4: // check the 4 bit and stack a boolean
      stack[nstack++] = ((word & 4) != 0);
      break;
    // etc. etc.
    case AND: // pop 2 booleans and push their AND
      nstack--;
      stack[nstack-1] = (stack[nstack-1] && stack[nstack]);
      break;
    case OR: // pop 2 booleans and push their OR
      nstack--;
      stack[nstack-1] = (stack[nstack-1] || stack[nstack]);
      break;
  }
}

The idea is to cause the compiler to turn the switch into a jump table, so it performs each operation with the smallest number of cycles. To generate the opcodes, you can just do a postfix walk of the tree.

On top of that, you might be able to simplify it by some manipulation of De Morgan's laws, so you could check more than one bit at a time.

Optimizing boolean logic tree evaluation

Tags:

java

optimization

boolean-logic

bytecode-manipulation

SyntaxT3rr0r

3 Answers

sehe

phkahler

Mike Dunlavey

Recent Activity

Donate For Us

Optimizing boolean logic tree evaluation

Tags:

java

optimization

boolean-logic

bytecode-manipulation

SyntaxT3rr0r

3 Answers

sehe

phkahler

Mike Dunlavey

Related questions

Recent Activity

Donate For Us