Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining whether a regex is a subset of another

I have a large collection of regular expression that when matched call a particular http handler. Some of the older regex's are unreachable (e.g. a.c* ⊃ abc*) and I'd like to prune them.

Is there a library that given two regex's will tell me if the second is subset of the first?

I wasn't sure this was decidable at first (it smelled like the halting problem by a different name). But it turns out it's decidable.

like image 918
deft_code Avatar asked Sep 10 '13 21:09

deft_code


People also ask

How can you tell if two regex are equivalent?

We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) = L(S) for two regular expressions R and S then R = S. Examples. Are R and S equivalent?

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

How do you compare regular expressions?

The regular expression comparison is performed on the string representation of the left side of the comparison. That is, if the left side is an integer, the regular expression will behave is if the value 0 was the literal string "0" .


1 Answers

Trying to find the complexity of this problem lead me to this paper.

The formal definition of the problem can be found within: this is generally called the inclusion problem

The inclusion problem for R, is to test for two given expressions r, r′ ∈ R, whether r ⊆ r′.

That paper has some great information (summary: all but the simplest expressions are fairly complex), however searching for information on the inclusion problem leads one directly back to StackOverflow. That answer already had a link to a paper describing a passable polynomial time algorithm which should cover a lot of common cases.

like image 196
Kevin Stricker Avatar answered Oct 02 '22 18:10

Kevin Stricker