Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PCRE: backreferences not allowed in lookbehinds?

The PCRE regex /..(?<=(.)\1)/ fails to compile: "Subpattern references are not allowed within a lookbehind assertion." Interestingly it seems to be acceptable in lookaheads, like /(?=(.)\1)../, just not in lookbehinds.

Is there a technical reason why backreferences are not allowed in lookbehinds specifically?

like image 932
Connor Smith Avatar asked Jun 06 '15 01:06

Connor Smith


Video Answer


1 Answers

With Python's re module, group references are not supported in lookbehind, even if they match strings of some fixed length.


Lookbehinds doesn't fully support PCRE rules. Concretely, when the regex engine reaches a lookbehind it'll try to determine it size, and then jump back to check the match.

This size determination brings you to a choice:

  • allow variable size, then every lookbehind needs to be executed before to jump back
  • disallow variable size, then we can directly jump back

As the first solution would be the best for us (users), it's obviously the slowest, and the hardest to develop. And so for PCRE regex, they resolved to use the second solution. The Java regex engine, for another example, allows semi-variable lookbehinds: you only need to determine the maximum size.


I came to PCRE and Python's re module.
I've not found anything else in PCRE documentation than this error code:

COMPILATION ERROR CODES
25: lookbehind assertion is not fixed length

But in this case, the lookbehind assertion is fixed length.
Now, here is what we can find in re documentation:

The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not. Group references are not supported even if they match strings of some fixed length.

We've got our guilty... If you want, you can try the Python's regex module , which seems to support variable length lookbehind.

like image 99
zessx Avatar answered Oct 26 '22 11:10

zessx