Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Determine if two regular expressions could match for the same input?

Tags:

regex

I want to find out if there could ever be conflicts between two known regular expressions, in order to allow the user to construct a list of mutually exclusive regular expressions.

For example, we know that the regular expressions below are quite different but they both match xy50:

'^xy1\d' '[^\d]\d2$' 

Is it possible to determine, using a computer algorithm, if two regular expressions can have such a conflict? How?

like image 756
Tom Avatar asked Aug 04 '10 22:08

Tom


People also ask

How do you know if two regular expressions are equivalent?

We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) = L(S) for two regular expressions R and S then R = S. Examples. Are R and S equivalent?

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does '$' mean in regex?

Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.


1 Answers

There's no halting problem involved here. All you need is to compute if the intersection of ^xy1\d and [^\d]\d2$ in non-empty.

I can't give you an algorithm here, but here are two discussions of a method to generate the intersection without resorting the construction of a DFA:

  • http://sulzmann.blogspot.com/2008/11/playing-with-regular-expressions.html

And then there's RAGEL

  • http://www.complang.org/ragel/

which can compute the intersection of regular expressions too.

UPDATE: I just tried out Ragel with OP's regexp. Ragel can generate a "dot" file for graphviz from the resulting state machine, which is terrific. The intersection of the OP's regexp looks like this in Ragel syntax:

('xy1' digit any*) & (any* ^digit digit '2')  

and has the following state machine:

enter image description here

While the empty intersection:

('xy1' digit any*) & ('q' any* ^digit digit '2') 

looks like this:

enter image description here

So if all else fails, then you can still have Ragel compute the intersection and check if it outputs the empty state machine, by comparing the generated "dot" file.

like image 112
Nordic Mainframe Avatar answered Sep 19 '22 14:09

Nordic Mainframe