Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting if two regexes could possibly match the same string [duplicate]

Tags:

regex

Given two regular expressions, is it possible to detect whether there is any possible string that matches them both?

For example, given regexes A and ., I can see that string "A" matches them both. That's a simple case.

My question is for the broader case -- given any two valid regexes, would it be possible to definitively say whether there is any possible string that would match both regexes? Assume that there is no sample set of input strings to test. All I have are the regexes. I don't necessarily need to produce matching strings -- I just need to determine that there are possible strings that match both.

Will accept discussions for any of the common regex specifications -- .NET, Java, PERL, sed, grep, etc.

like image 797
Michael Gunter Avatar asked Mar 25 '14 21:03

Michael Gunter


People also ask

How can you tell if two expressions are equivalent?

We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) = L(S) for two regular expressions R and S then R = S. Examples.

What does\\ mean in regex?

The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally. For more information, see Character Escapes. Escaped character. Description. Pattern.

What is s+ in regex?

On the other hand, the \S+ (uppercase S ) matches anything that is NOT matched by \s , i.e., non-whitespace. In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.

What is multiline matching?

Multiline option, it matches either the newline character ( \n ) or the end of the input string. It does not, however, match the carriage return/line feed character combination.


1 Answers

Basically, you want to test if the intersection of two RegExps is non-empty. Since intersection - just like complement - is a potentially expensive operation (it requires determinization of the NFA), it is not implemented in many RegExp implementations. One exception I know of is the BRICS Automaton Library, which allows enabling the intersection operator &.

To test the property in question, you could use the BRICS (Java) library like this:

RegExp re = new RegExp("(.) & (a)", RegExp.INTERSECTION); // Parse RegExp
Automaton a = re.toAutomaton(); // convert RegExp to automaton

if(a.isEmpty()) { // Test if intersection is empty
  System.out.println("Intersection is empty!");
}
else {
  // Print the shortest accepted string
  System.out.println("Intersection is non-empty, example: " + a.getShortestExample(true));
}
like image 147
misberner Avatar answered Oct 22 '22 22:10

misberner