Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a PCRE to a POSIX RE?

Tags:

regex

posix

This interesting question Regex to match anything (including the empty string) except a specific given string concerned how to do a negative look-ahead in MySQL. The poster wanted to get the effect of

Kansas(?! State)

because MySQL doesn't implement look-ahead assertions, a number of answers came up the equivalent

Kansas($|[^ ]| ($|[^S])| S($|[^t])| St($|[^a])| Sta($|[^t])| Stat($|[^e]))

The poster pointed out that's a PITA to do for potentially lots of expressions.

Is there a script/utility/mode of PCRE (or some other package) that will convert a PCRE (if possible) to an equivalent regex that doesn't use Perl's snazzy features? I'm fully aware that some Perl-style regexes cannot be stated as an ordinary regex, so I would not expect the tool to do the impossible, of course!

like image 993
David M Avatar asked May 14 '10 21:05

David M


People also ask

What is Posix regex?

POSIX bracket expressions are a special kind of character classes. POSIX bracket expressions match one character out of a set of characters, just like regular character classes. They use the same syntax with square brackets. A hyphen creates a range, and a caret at the start negates the bracket expression.

What PCRE matching does?

PCRE tries to match Perl syntax and semantics as closely as it can. PCRE also supports some alternative regular expression syntax (which does not conflict with the Perl syntax) in order to provide some compatibility with regular expressions in Python, . NET, and Oniguruma.

Does Ruby use PCRE?

Ruby regular expressions are PCRE regular expressions as PCRE stands for Perl Compatible Regular Expressions and defines a particular syntax for the regular expressions it supports.


1 Answers

You don't want to do this. It isn't actually mindbogglingly difficult to translate the advanced features to basic features - it's just another flavor of compiler, and compiler writers are pretty clever people - but most of the things that the snazzy features solve are (a) impossible to do with a standard regex because they recognize non-regular languages, so you'd have to approximate them so that at least they work for a limited-length text or (b) possible, but only with a regex of exponential size. And 'exponential' is compsci-speak for "don't go there". You will get swamped in OutOfMemory errors and seemingly-infinite loops if you try to use an exponential solution on anything you would actually want to process.

In other words, Abandon all hope, ye who enter here. It is virtually always better to let the regex do what it's good at and do the rest with other tools. Even such a simple thing as inverting a regex is much, much easier solved with the original regex in combination with the negation operator than with the monstrosity that would result from an accurate regex inverter.

like image 79
Kilian Foth Avatar answered Oct 08 '22 18:10

Kilian Foth