I've seen Ruby and Perl programmers do some complicated code challenges entirely with regexes. The lookahead and lookbehind capabilities in Perl regexes make them more powerful than the regex implementations in most other languages. I was wondering how powerful they really are.
Is there an easy way to either prove or disprove that Perl regexes are Turing complete?
Programming Languages are typically defined as languages that are Turing Complete. Such languages must be able to process any computable function. Regex does not fit into this category.
In general, Perl uses a backtrack regex engine. Such an engine is flexible, easy to implement and very fast on a subset of regex. However, for other types of regex, for example when there is the | operator, it may become very slow. In the extreme case, its match speed is exponential in the length of the pattern.
Excluding any kind of embedded code, such as ?{ }
, they probably don't cover all of context-free, much less Turing Machines. They might, but to my knowledge, nobody has actually proven it one way or another. Given that people have been trying to solve certain context-free problems with Perl regexes for a while and haven't come up with a solution yet, it's likely that they are not context-free.
There is an interesting discussion to be had about what features are merely convenient, and which actually add power. For instance, matching 0n*1*0n (that's notation for "any number of zeros, followed by a one, followed by the same number of zeros as before") is not something that can be done with pure regexes. You can prove this can't be done with regexes using the Pumping Lemma, but the simple, informal proof is that the regex would have to count an arbitrary number of zeros, and regexes can't do counting.
However, backreferences can match that with:
/(0*) 1 \1/x;
So that means backreferences give you more power, and are not a mere convenience. What else might give us more power, I wonder?
Also, Perl6 "patterns" (they're not even pretending they're regexes anymore) are designed to look kinda like Perl5 regexes (so you don't need to relearn much), but they have enough features added to be fully context-free. They're actually designed so you can use them to alter the way the language is parsed within a lexical scope.
There are at least two discussions: Turing completeness and regular expressions and Are Perl patterns universal? with further references.
The consensus (to my untrained eye) seems to be that the answer is "no", but I am not sure if I understand everything correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With