Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Drools in a heavy batch process

Tags:

drools

We used Drools as part of a solution to act as a sort of filter in a very intense processing application, maybe running up to 100 rules on 500,000 + working memory objects. turns out that it is extremely slow. anybody else have any experience using Drools in a batch type processing application?

like image 683
bmw0128 Avatar asked Sep 18 '08 14:09

bmw0128


4 Answers

Kind of depends on your rules - 500K objects is reasonable given enough memory (it has to populate a RETE network in memory, so memory usage is a multiple of 500K objects - ie space for objects + space for network structure, indexes etc) - its possible you are paging to disk which would be really slow.

Of course, if you have rules that match combinations of the same type of fact, that can cause an explosion of combinations to try, which even if you have 1 rule will be really really slow. If you had any more information on the analysis you are doing that would probably help with possible solutions.

like image 182
Michael Neale Avatar answered Oct 24 '22 00:10

Michael Neale


I've used a Drools with a stateful working memory containing over 1M facts. With some tuning of both your rules and the underlying JVM, performance can be quite good after a few minutes for initial start-up. Let me know if you want more details.

like image 21
ShabbyDoo Avatar answered Oct 24 '22 01:10

ShabbyDoo


I haven't worked with the latest version of Drools (last time I used it was about a year ago), but back then our high-load benchmarks proved it to be utterly slow. A huge disappointment after having based much of our architecture on it.

At least something good I remember about drools is that their dev team was available on IRC and very helpful, you might give them a try, they're the experts after all: irc.codehaus.org #drools

like image 3
Gilles Avatar answered Oct 24 '22 00:10

Gilles


I'm just learning drools myself, so maybe I'm missing something, but why is the whole batch of five hundred thousand objects added to working memory at once? The only reason I can think of is that there are rules that kick in only when two or more items in the batch are related.

If that isn't the case, then perhaps you could use a stateless session and assert one object at a time. I assume rules will run 500k times faster in that case.

Even if it is the case, do all your rules need access to all 500k objects? Could you speed things up by applying per-item rules one at a time, and then in a second phase of processing apply batch level rules using a different rulebase and working memory? This would not change the volume of data, but the RETE network would be smaller because the simple rules would have been removed.

An alternative approach would be to try and identify the related groups of objects and assert the objects in groups during the second phase, further reducing the volume of data in working memory as well as splitting up the RETE network.

like image 3
Simon Gibbs Avatar answered Oct 24 '22 01:10

Simon Gibbs