Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to negate a regular expression?

Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as Wikipedia says:

The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: […] the complement ¬L

For example, given the alphabet {a,b,c}, the inverse of the language (abc*)+ is (a|(ac|b|c).*)?


As DPenner has already pointed out in the comments, the inverse of a regular expresion can be exponentially larger than the original expression. This makes inversing regular expressions unsuitable to implement negative partial expression syntax for searching purposes. Is there an algorithm that preserves the O(n*m) runtime characteristic (where n is the size of the regex and m is the length of the input) of regular expression matching and allows for negated subexpressions?

like image 355
fuz Avatar asked Mar 11 '13 11:03

fuz


People also ask

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

Is it possible to reverse a string in regular expression?

reverse! # => "This string is backwards." s # => "This string is backwards." To reverse the order of the words in a string, split the string into a list of whitespaceseparated words, then join the list back into a string. The String#split method takes a regular expression to use as a separator.

How do I not match a character in regex?

To replace or remove characters that don't match a regex, call the replace() method on the string passing it a regular expression that uses the caret ^ symbol, e.g. /[^a-z]+/ .


2 Answers

Unfortunately, the answer given by nhahdtdh in the comments is as good as we can do (so far). Whether a given regular expression generates all strings is PSPACE-complete. Since all problems in NP are in PSPACE-complete, an efficient solution to the universality problem would imply that P=NP.

If there were an efficient solution to your problem, would you be able to resolve the universality problem? Sure you would.

  1. Use your efficient algorithm to generate a regular expression for the negation;
  2. Determine whether the resulting regular expression generates the empty set.

Note that the problem "given a regular expression, does it generate the empty set" is fairly straightforward:

  1. The regular expression {} generates the empty set.
  2. (r + s) generates the empty set iff both r and s generate the empty set.
  3. (rs) generates the empty set iff either r or s generates the empty set.
  4. Nothing else generates the empty set.

Basically, it's pretty easy to tell whether a regular expression generates the empty set: just start evaluating the regular expression.

(Note that while the above procedure is efficient in terms of the output length, it might not be efficient in terms of the input length, if the output length is more than polynomially faster than the input length. However, if that were the case, we'd have the same result anyway, i.e., that your algorithm isn't really efficient, since it would take exponentially many steps to generate an exponentially longer output from a given input).

like image 181
Patrick87 Avatar answered Sep 29 '22 10:09

Patrick87


Wikipedia says: ... if there exists at least one regex that matches a particular set then there exist an infinite number of such expressions. We can deduct from this statement that there is an infinite number of expressions that describe the language of all words except those described by R.

Again, (as also @nhahtdh tried to explain) the simplest algorithm to address this question is to extend the scope of evaluation outside the context of the regular expression language itself. That is: match the strings you want to exclude (which represent a finite subset to work with) by using the original regular expression and then treat any failure to match as an actual match (out of an infinite set of other possibilities). So, if the result of the match is negative, your candidate strings are a subset of the valid solutions.

like image 23
Alex Filipovici Avatar answered Sep 29 '22 10:09

Alex Filipovici