Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl fails stating `Variable length lookbehind not implemented`

Tags:

regex

perl

I'm basically making a type of word boundary assertion. I want to make a test for if [abc] is not behind and if [abc] is ahead and vice versa.

So I tried to make a test for it and do the negation like this:

#!/usr/bin/perl
($_) = "abcdef" =~
/
((?&BB).*)
|
  (?!)
  (?<W>[abc])
  (?<NW>[^abc])
  (?<BB>
     (?<=(?&W))(?=(?&NW))
    |(?<=(?&NW))(?=(?&W))
  )
/x;
print;

Which doesn't work. However, if I do this:

#!/usr/bin/perl
($_) = "abcdef" =~
/
  ((?&BB).*)
| (?!)
  (?<W>[abc])
  (?<NW>[^abc])
  (?<BB>
      (?<=[abc])(?=[^abc])
    | (?<=[^abc])(?=[abc])
  )
/x;
print;

It does. What's going on here? Where's the variable length lookbehind?

FYI, I know what the message means. I'd like to know why perl is thinking a named group is of variable length and how do I get it to stop thinking that? To me, this looks to be a bug. Anyone else concur?

Using versions:

This is perl 5, version 14, subversion 4 (v5.14.4) built for cygwin-thread-multi
This is perl 5, version 16, subversion 2 (v5.16.2) built for i686-linux

EDIT

So I found a work around which is sufficient.

#!/usr/bin/perl
$chars = qr/[abc]/;
$notChars = qr[^abc]/;
($_) = "abcdef" =~
/
  ((?&BB).*)
| (?!)
  (?<BB>
      (?<=$chars)(?=$notChars)
    | (?<=$notChars)(?=$chars)
  )
/x;
print;
like image 993
Adrian Avatar asked Nov 20 '13 18:11

Adrian


3 Answers

The lookbehind node simply looks at its children, sees that it contains a named subrule match, and decides that a named subrule match isn't necessarily fixed-length. It doesn't look inside of the named subrule to find out that it actually does have a fixed length, and I'm not sure if it can given the present state of the code. Since it's unable to determine a fixed length, it can't compile the lookbehind.

Perhaps the message should be along the lines of Can't determine the length of '(?&W)' for use in lookbehind instead of Variable length lookbehind not implemented.

like image 67
hobbs Avatar answered Oct 10 '22 18:10

hobbs


Looks like it's here:

(?<=(?&W))(?=(?&NW))
    |(?<=(?&NW))(?=(?&W))

In Perl 5, regex doesn't support the looking behind for a variable number of captures (the stuff in the ()'s).

The ?<= and whatever follows in ()'s is where the lookbehind syntax is.

Edit: Comments below lead to clarification of question.

It looks like the variable length you have is inherent in the fact you have a named pattern for [^abc] which has a vast multitude of lengths of matches that could return. The variable length comes from the fact that any length of text can match the character class of !abc.

Perl 6 seems to support this in some fashion.

See this link to the RFC for Perl 6 regarding this issue http://perl6.org/archive/rfc/72.html

like image 41
Ryan J Avatar answered Oct 10 '22 19:10

Ryan J


The match of a named capture has not always a fixed length, it is the reason why the regex engine doesn't allow to put a backreference in a lookbehind. Example of variable length named capture:

/(?|a(?<toto>ef)|b(?<toto>ghi))/
like image 25
Casimir et Hippolyte Avatar answered Oct 10 '22 19:10

Casimir et Hippolyte