Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as Wikipedia says: <blockquote> The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: […] the complement ¬L </blockquote> For example, given the alphabet {a,b,c}, the inverse of the language (abc*)+ is (a|(ac|b|c).*)? <hr> As DPenner has already pointed out in the comments, the inverse of a regular expresion can be exponentially larger than the original expression. This makes inversing regular expressions unsuitable to implement negative partial expression syntax for searching purposes. Is there an algorithm that preserves the O(n*m) runtime characteristic (where n is the size of the regex and m is the length of the input) of regular expression matching and allows for negated subexpressions?

Unfortunately, the answer given by nhahdtdh in the comments is as good as we can do (so far). Whether a given regular expression generates all strings is PSPACE-complete. Since all problems in NP are in PSPACE-complete, an efficient solution to the universality problem would imply that P=NP. If there were an efficient solution to your problem, would you be able to resolve the universality problem? Sure you would. <ol> <li>Use your efficient algorithm to generate a regular expression for the negation;</li> <li>Determine whether the resulting regular expression generates the empty set.</li> </ol> Note that the problem "given a regular expression, does it generate the empty set" is fairly straightforward: <ol> <li>The regular expression <code>{}</code> generates the empty set.</li> <li> <code>(r + s)</code> generates the empty set iff both <code>r</code> and <code>s</code> generate the empty set.</li> <li> <code>(rs)</code> generates the empty set iff either <code>r</code> or <code>s</code> generates the empty set.</li> <li>Nothing else generates the empty set.</li> </ol> Basically, it's pretty easy to tell whether a regular expression generates the empty set: just start evaluating the regular expression. (Note that while the above procedure is efficient in terms of the output length, it might not be efficient in terms of the input length, if the output length is more than polynomially faster than the input length. However, if that were the case, we'd have the same result anyway, i.e., that your algorithm isn't really efficient, since it would take exponentially many steps to generate an exponentially longer output from a given input).

Is there a way to negate a regular expression?

Tags:

regex

algorithm

regular-language

Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as Wikipedia says:

The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: […] the complement ¬L

For example, given the alphabet {a,b,c}, the inverse of the language (abc*)+ is (a|(ac|b|c).*)?

As DPenner has already pointed out in the comments, the inverse of a regular expresion can be exponentially larger than the original expression. This makes inversing regular expressions unsuitable to implement negative partial expression syntax for searching purposes. Is there an algorithm that preserves the O(n*m) runtime characteristic (where n is the size of the regex and m is the length of the input) of regular expression matching and allows for negated subexpressions?

355

asked Mar 11 '13 11:03

fuz

2 Answers

Unfortunately, the answer given by nhahdtdh in the comments is as good as we can do (so far). Whether a given regular expression generates all strings is PSPACE-complete. Since all problems in NP are in PSPACE-complete, an efficient solution to the universality problem would imply that P=NP.

If there were an efficient solution to your problem, would you be able to resolve the universality problem? Sure you would.

Use your efficient algorithm to generate a regular expression for the negation;
Determine whether the resulting regular expression generates the empty set.

Note that the problem "given a regular expression, does it generate the empty set" is fairly straightforward:

The regular expression {} generates the empty set.
(r + s) generates the empty set iff both r and s generate the empty set.
(rs) generates the empty set iff either r or s generates the empty set.
Nothing else generates the empty set.

Basically, it's pretty easy to tell whether a regular expression generates the empty set: just start evaluating the regular expression.

(Note that while the above procedure is efficient in terms of the output length, it might not be efficient in terms of the input length, if the output length is more than polynomially faster than the input length. However, if that were the case, we'd have the same result anyway, i.e., that your algorithm isn't really efficient, since it would take exponentially many steps to generate an exponentially longer output from a given input).

181

answered Sep 29 '22 10:09

Patrick87

Wikipedia says: ... if there exists at least one regex that matches a particular set then there exist an infinite number of such expressions. We can deduct from this statement that there is an infinite number of expressions that describe the language of all words except those described by R.

Again, (as also @nhahtdh tried to explain) the simplest algorithm to address this question is to extend the scope of evaluation outside the context of the regular expression language itself. That is: match the strings you want to exclude (which represent a finite subset to work with) by using the original regular expression and then treat any failure to match as an actual match (out of an infinite set of other possibilities). So, if the result of the match is negative, your candidate strings are a subset of the valid solutions.

answered Sep 29 '22 10:09

Alex Filipovici

Related questions
                            
                                Removing Dollar and comma from string
                            
                                Split by '/' till '[' appears
                            
                                Regex matching numbers and decimals
                            
                                Regular Expression to extract php code partially (( array definition ))
                            
                                Remove all html tags from attributes in rails
                            
                                Replace the last occurrence of a string in another string [duplicate]
                            
                                Split string by char in java
                            
                                how to remove brackets character in string (java)
                            
                                Java - How to test if a String contains both letters and numbers
                            
                                Regular expression to validate hex string
                            
                                How can I delete all /* */ comments from a C source file?
                            
                                Why this regex is not working for german words?
                            
                                How can I escape meta-characters when I interpolate a variable in Perl's match operator?
                            
                                Is it possible for Perl to preserve case in a substitution? [duplicate]
                            
                                Generalization for regular expression on any list
                            
                                How to search patterns in arbitrary sequences?
                            
                                Prevent tainting properties of the RegExp constructor in JavaScript
                            
                                Javascript RegEx partial match
                            
                                Replace a regular expression submatch using a function
                            
                                Stack Overflow in java regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With