Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast ordered list matching algorithm in Java

I have a list of rules in the form

L1 -> (A, B, C)

L2 -> (D, E),

L3 -> (F, G, A),

L4 -> (C, A)

.....

This list contains ~30k such rules.

I have an input in the form (X, Y, Z)

This creates a method

List <Rule> matchRules(input)

Which belongs to a class RuleMatcher

I started with a very simple clear naive solution, in order to get the framework down, get something working.

public RuleMatcher(Collection<Rule> rules) {
   this.rules = rules;
}

public Collection<Rule> matchRules(List<Token> input) {
   List<Rule> matchingRules = new ArrayList<>();
   for(Rule r: this.rules) {
        if(r.matches(input)) {
            matchingRules.add(r);
        }
   }
   return matchingRules; 
}

Where matches is a very simple function that checks if the lengths are the same, and then checks each token as a for loop.

This matchRules function is called in the magnitude of billions of times.


Obviously this is a very poor implementation. According to my profiler at least half of the execution time is is spent in this matches function.

I was thinking of two possible solutions:

A. Some sort of Trie data structure holding the chains of rules which could be matched.

B. some sort of hash function. Each symbol is given a unique identifier. Unfortunately, there are about 8 thousand unique symbols so this might be difficult.

C. Make a hashmap conditioning on the size of the right hand side, the number of tokens in the rule. Unfortunately, the majority of the rules are about the same size, so this may not even be worthwhile.

D. Some awesome solution one of you come up with.

I hope somebody can shed some light on this problem.


Edit: A token is just an object with a unique number. For example "NN" is a token. Each instance of "NN" is exactly the same.

Matches code:

public boolean rhsMatches(List<Token> tokens) {
   if(tokens.size()!=rhsSize()) return false;
   for(int i = 0;i<rhsSize();i++) {
      if(!rightSide.get(i).equals(tokens.get(i)) {
        return false;
      }
   }
   return true;
}

Its not very pretty, but its simple.

like image 971
user498001 Avatar asked Jan 16 '14 16:01

user498001


1 Answers

Why not sort your rule list to begin with. Then you can binary search for the matching rule.

like image 115
ElKamina Avatar answered Sep 20 '22 17:09

ElKamina