Let's say that I have a list of regular expressions (read from an external source - file, database etc). I want to check which of these regular expressions a string matches.
I can create iterate through all these regular expression and match them, but the list could be a huge one and this is a critical operation.
I can combine all these regular expressions into one (with | between them), but then the problem is that I can identify only the first matched regular expression, not all.
Another idea could be to create an automaton for all these regular expression and to mark the final states with, let's say, the indexes of the corresponding regular expression. I was looking at http://cs.au.dk/~amoeller/automaton/, a library which seems capable to work with regular expressions and automaton, but not sure it can be extended to solve my problem.
Do you have any other ideas?
To clarify some comments, I've added a code sample:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternTest {
public static void main(String[] args) {
Pattern p = Pattern.compile("(a(?:b|c)a)|((?:a|b)ba)|(ab(?:a|c))");
Matcher m = p.matcher("aba");
System.out.println(m.matches());
System.out.println(m.groupCount());
for (int i = 0, n = m.groupCount(); i < n; i++) {
System.out.println(m.group(i));
}
}
}
will print out
true
3
aba
aba
null
As you can see only the first group is matched and I don't see a way to match the other two ones.
More findings - Using the automaton library above, the problem will reduce to the following: if you concatenate two or more automatons, how you can identify for a final state to which of the original automatons is corresponding?
dk.brics.automaton does not directly support this, but you can generalize the representation of automata (and the relevant automata operations) to distinguish between different kinds of accept states. Start by adding an int field, for example, to the State class and use this field whenever 'accept' is set.
I implemented such a solution based on dk.brics.automaton, you can find it here. https://github.com/fulmicoton/multiregexp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With