Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I turn any regex into an complement of itself without complex hand editing?

The following are pseudo examples, not real regex, but still an example of what I mean:


.* (anything)

-.* (NOT anything)

[A-Z] (Any letter A to Z, caps only)

-[A-Z] (NOT any letter A to Z, caps only)

EDIT: Changed inverse into complement in the question. Here's where the change was made: "turn any regex into an complement of itself "

like image 961
blunders Avatar asked Oct 20 '10 11:10

blunders


1 Answers

First of all, I believe you mean the complement of a regular expression, not it's inverse. The inverse of a regular expression doesn't make much sense; but if viewed as a function, I suppose you could say that the inverse of the matcher is the generator which generates all matching strings - or something. On the other hand, the complement of a language is all those strings not in the original language.

Then, there are two views to consider here:

Fundamentally

The complement of a regular language is regular. That means it's possible to generate an accepting DFA for the complement (and doing so is very simple, actually: just swap the non-accepting state set with the accepting state set). Any such DFA can be expressed as a regular expression - so in principle you can indeed make such a regex.

See the wikipedia article on Regular Languages as a starting point.

Practically

The typical perl-compatible regex syntax used in most modern languages nowadays does not have a complementation operator. For a complete regex, you can get something similar by using the negative lookahead operator: (?!X) will match a string precisely when X will not. However, this is a poor replacement for complement operator as you will not be able to use it as a part of a larger regex in the usual fashion; this regex doesn't "consume" input which means it behaves differently in conjunction with other operators.

For example, if you match numeric strings as [0-9]*, to match the entire string you'd prepend ^ and append $, but to use this technique to find the complement you'd need to write ^(?!^[0-9]*$).*$ - and the usual concatenation of such a negated regex is, as far as I can tell, undoable.

Somewhat ironically, the practical incarnation of regexes is theoretically more powerful due to backreferences, but practically less flexible since the language can't quite express the complement and intersection operations easily.

like image 188
Eamon Nerbonne Avatar answered Nov 05 '22 01:11

Eamon Nerbonne