Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutually exclusive regular expressions

If I have a list of regular expressions, is there an easy way to determine that no two of them will both return a match for the same string?

That is, the list is valid if and only if for all strings a maximum of one item in the list will match the entire string.

It seems like this will be very hard (maybe impossible?) to prove definitively, but I can't seem to find any work on the subject.

The reason I ask is that I am working on a tokenizer that accepts regexes, and I would like to ensure only one token at a time can match the head of the input.

like image 439
captncraig Avatar asked Jun 03 '10 16:06

captncraig


2 Answers

If you're working with pure regular expressions (no backreferences or other features that cause them to recognize context-free or more complicated languages), what you ask is possible. What you can do is convert each regex to a DFA, then (since regular languages are closed under intersection) combine them into a DFA that recognizes the intersection of the two languages. If that DFA has a path from the start state to an accepting state, that string is accepted by both input regexen.

The problem with this is that the first step of the usual regex->DFA algorithm is to convert the regex to a NFA, then convert the NFA to a DFA. But that last step can result in an exponential blowup in the number of DFA states, so this will only be feasible for very simple regular expressions.

If you are working with extended regex syntax, all bets are off: context-free languages are not closed under intersection, so this method won't work.

like image 102
Jim Lewis Avatar answered Sep 30 '22 09:09

Jim Lewis


The Wkipedia article on regular expressions does state

It is possible to write an algorithm which for two given regular expressions decides whether the described languages are essentially equal, reduces each expression to a minimal deterministic finite state machine, and determines whether they are isomorphic (equivalent).

but gives no further hints.

Of course the easy way you are after is to run a lot of tests -- but we all know the shortcomings of testing as a method of proof.

like image 31
High Performance Mark Avatar answered Sep 30 '22 09:09

High Performance Mark