I'm basically making a type of word boundary assertion. I want to make a test for if [abc]
is not behind and if [abc]
is ahead and vice versa.
So I tried to make a test for it and do the negation like this:
#!/usr/bin/perl
($_) = "abcdef" =~
/
((?&BB).*)
|
(?!)
(?<W>[abc])
(?<NW>[^abc])
(?<BB>
(?<=(?&W))(?=(?&NW))
|(?<=(?&NW))(?=(?&W))
)
/x;
print;
Which doesn't work. However, if I do this:
#!/usr/bin/perl
($_) = "abcdef" =~
/
((?&BB).*)
| (?!)
(?<W>[abc])
(?<NW>[^abc])
(?<BB>
(?<=[abc])(?=[^abc])
| (?<=[^abc])(?=[abc])
)
/x;
print;
It does. What's going on here? Where's the variable length lookbehind?
FYI, I know what the message means. I'd like to know why perl is thinking a named group is of variable length and how do I get it to stop thinking that? To me, this looks to be a bug. Anyone else concur?
Using versions:
This is perl 5, version 14, subversion 4 (v5.14.4) built for cygwin-thread-multi
This is perl 5, version 16, subversion 2 (v5.16.2) built for i686-linux
EDIT
So I found a work around which is sufficient.
#!/usr/bin/perl
$chars = qr/[abc]/;
$notChars = qr[^abc]/;
($_) = "abcdef" =~
/
((?&BB).*)
| (?!)
(?<BB>
(?<=$chars)(?=$notChars)
| (?<=$notChars)(?=$chars)
)
/x;
print;
The lookbehind node simply looks at its children, sees that it contains a named subrule match, and decides that a named subrule match isn't necessarily fixed-length. It doesn't look inside of the named subrule to find out that it actually does have a fixed length, and I'm not sure if it can given the present state of the code. Since it's unable to determine a fixed length, it can't compile the lookbehind.
Perhaps the message should be along the lines of Can't determine the length of '(?&W)' for use in lookbehind
instead of Variable length lookbehind not implemented
.
Looks like it's here:
(?<=(?&W))(?=(?&NW))
|(?<=(?&NW))(?=(?&W))
In Perl 5, regex doesn't support the looking behind for a variable number of captures (the stuff in the ()'s).
The ?<= and whatever follows in ()'s is where the lookbehind syntax is.
Edit: Comments below lead to clarification of question.
It looks like the variable length you have is inherent in the fact you have a named pattern for [^abc] which has a vast multitude of lengths of matches that could return. The variable length comes from the fact that any length of text can match the character class of !abc.
Perl 6 seems to support this in some fashion.
See this link to the RFC for Perl 6 regarding this issue http://perl6.org/archive/rfc/72.html
The match of a named capture has not always a fixed length, it is the reason why the regex engine doesn't allow to put a backreference in a lookbehind. Example of variable length named capture:
/(?|a(?<toto>ef)|b(?<toto>ghi))/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With