Complex Boolean expression optimization, normal forms?

Tags:

I'm working on a streaming rules engine, and some of my customers have a few hundred rules they'd like to evaluate on every event that arrives at the system. The rules are pure (i.e. non-side-effecting) Boolean expressions, and they can be nested arbitrarily deeply.

Customers are creating, updating and deleting rules at runtime, and I need to detect and adapt to the population of rules dynamically. At the moment, the expression evaluation uses an interpreter over the internal AST, and I haven't started thinking about codegen yet.

As always, some of the predicates in the tree are MUCH cheaper to evaluate than others, and I've been looking for an algorithm or data structure that makes it easier to find the predicates that are cheap, and that are validly interpretable as controlling the entire expression. My mental headline for this pattern is "ANDs all the way to the root", i.e. any predicate for which all ancestors are ANDs can be interpreted as controlling.

Despite several days of literature search, reading about ROBDDs, CNF, DNF, etc., I haven't been able to close the loop from what might be common practice in the industry to my particular use case. One thing I've found that seems related is Analysis and optimization for boolean expression indexing but it's not clear how I could apply it without implementing the BE-Tree data structure myself, as there doesn't seem to be an open source implementation.

I keep half-jokingly mentioning to my team that we're going to need a SAT solver one of these days. 😅 I guess it would probably suffice to write a recursive algorithm that traverses the tree and keeps track of whether every ancestor is an AND or an OR, but I keep getting the "surely this is a solved problem" feeling. :)

Edit: After talking to a couple of friends, I think I may have a sketch of a solution!

Transform the expressions into Conjunctive Normal Form, in which, by definition, every node is in a valid short-circuit position.
Use the Tseitin algorithm to try to avoid exponential blowups in expression size as a result of the CNF transform
For each AND in the tree, sort it in ascending order of cost (i.e. cheapest to the left)
???
Profit!^Weval as usual :)

735

asked Jun 21 '21 17:06

Alex Cruise

Video Answer

1 Answers

You should seriously consider compiling the rules (and the predicates). An interpreter is 10-50x slower than machine code for the same thing. This is a good idea if the rule set doesn't change very often. Its even a good idea if the rules can change dynamically because in practice they still don't change very fast, although now your rule compiler has be online. Eh, just makes for a bigger application program and memory isn't much of an issue anymore.

A Boolean expression evaluation using individual machine instructions is even better. Any complex boolean equation can be compiled in branchless sequences of individual machine instructions over the leaf values. No branches, no cache misses; stuff runs pretty damn fast. Now, if you have expensive predicates, you probably want to compile code with branches to skip subtrees that don't affect the result of the expression, if they contain expensive predicates.

Within reason, you can generate any equivalent form (I'd run screaming into the night over the idea of using CNF because it always blows up on you). What you really want is the shortest boolean equation (deepest expression tree) equivalent to what the clients provided because that will take the fewest machine instructions to execute. This may sound crazy, but you might consider exhaustive search code generation, e.g., literally try every combination that has a chance of working, especially if the number of operators in the equation is relatively small. The VLSI world has been working hard on doing various optimizations when synthesizing boolean equations into gates. You should look into the the Espresso hueristic boolean logic optimizer (https://en.wikipedia.org/wiki/Espresso_heuristic_logic_minimizer)

One thing that might drive you expression evaluation is literally the cost of the predicates. if I have formula A and B, and I know that A is expensive to evaluate and usually returns true, then clearly I want to evaluate B and A instead.

You should consider common sub expression evaluation, so that any common subterm is only computed once. This is especially important when one has expensive predicates; you never want to evaluate the same expensive predicate twice.

I implemented these tricks in a PLC emulator (these are basically machines that evaluate buckets [like hundreds of thousands] of boolean equations telling factory actuators when to move) using x86 machine instructions for AND/OR/NOT for Rockwell Automation some 20 years ago. It outran Rockwell's "premier" PLC which had custom hardware but was essentially an interpreter.

You might also consider incremental evaluation of the equations. The basic idea is not to re-evaluate all the equations over and over, but rather to re-evaluate only those equations whose input changed. Details are too long to include here, but a patent I did back then explains how to do it. See https://patents.google.com/patent/US5623401A/en?inventor=Ira+D+Baxter&oq=Ira+D+Baxter

110

answered Oct 19 '22 10:10

Ira Baxter

Related questions
                            
                                Java performance in numerical algorithms
                            
                                Detect how much stress the mysql database is currently under with PHP
                            
                                JS tween how to improve?
                            
                                2D Spatial Data Structure suitable for Flocking Boids in Java
                            
                                Find out compilation optimization flag from executable
                            
                                Optimization of "static" loops
                            
                                How to minimize memory consumption of c++ program using copy-on-write?
                            
                                Why modifying global variable increases memory usage in Chrome
                            
                                Very high processing difference between two almost similar while loops
                            
                                Algorithm to find the union of multiple strings grouped by character index
                            
                                How to measure network pressure?
                            
                                Memory allocation optimized away by compilers
                            
                                Optimize JavaScript DrillDown code
                            
                                How come this both loop runs equally fast when compiled with -O3, but not when compiled with -O2?
                            
                                Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?
                            
                                Counting bigrams real fast (with or without multiprocessing) - python
                            
                                Custom expected returns in the Portfolio Analytics package
                            
                                Build a matrix depending on two random walks in a graph
                            
                                Optimization techniques for backtracking regex implementations
                            
                                MySQL distinct performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Complex Boolean expression optimization, normal forms?

Tags:

optimization

boolean-expression

expression

sat

Alex Cruise

People also ask

Video Answer

1 Answers

Ira Baxter

Recent Activity

Donate For Us