Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do @+ and @{^CAPTURE} differ in length?

Tags:

regex

perl

I'm trying to understand how the regex variables work, so I can save submatch positions in the payload within embedded code expressions. According to perlvar, the positive indices of the array correspond to $1, $2, $3, etc., but that doesn't seem to be the case?

#!/usr/bin/perl -w
use v5.28;
use Data::Dumper;

"XY" =~ / ( (.*) (.) (?{ 
    say Dumper { match_end => \@+ };
    say Dumper { capture => \@{^CAPTURE} }
}) ) (.)/x;

Output:

$VAR1 = {
          'match_end' => [
                           2,
                           undef,
                           1,
                           2,
                           undef
                         ]
        };

$VAR1 = {
          'capture' => [
                         undef,
                         'X',
                         'Y'
                       ]
        };

$VAR1 = {
          'match_end' => [
                           1,
                           2,
                           0,
                           1,
                           undef
                         ]
        };

$VAR1 = {
          'capture' => [
                         'XY',
                         '',
                         'X'
                       ]
        };
like image 297
rubystallion Avatar asked Feb 13 '20 16:02

rubystallion


2 Answers

The @+ array apparently gets allocated, or otherwise prepared, already at compilation

perl -MData::Dump=dd -we'$_=q(abc); / (?{dd @+})  ( (.) )/x'

prints

(0, undef, undef)

(0 for the whole match and an undef for each indicated capture group), while

perl -MData::Dump=dd -we'$_=q(abc); / (?{dd @+})  ( (.) (.) )/x'

prints

(0, undef, undef, undef)

with one more element for one more capture group.

One the other hand, the @{^CAPTURE} is just plain empty until there are actual patterns to capture, as we can see from mob's detailed analysis. This, I'd say, plays well with its name.

After the fact the arrays agree, with that shift of one in indices since @+ also contains (offset for) the whole match, at $+[0].

Another difference is that a trailing failed optional match doesn't get a slot in @{^CAPTURE}

perl -MData::Dump=dd -we'$_=q(abc); /((x)? (.) (x)?)/x; dd @+; dd @{^CAPTURE}'

prints

(1, 1, undef, 1, undef)
("a", undef, "a")
like image 52
zdim Avatar answered Oct 08 '22 20:10

zdim


The perlvar docs are unclear about what @{^CAPTURE} look like in the middle of a regexp evaluation, but there is a clear progression that depends where in the regexp you are looking at it.

use 5.026;
use Data::Dumper; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 0;

sub DEBUG_CAPTURE { say Dumper { a => $_[0], capture => \@{^CAPTURE} }; }

"XY" =~ /
   (?{DEBUG_CAPTURE(0)})
   (
     (?{DEBUG_CAPTURE(1)}) 
     (
             (?{DEBUG_CAPTURE(2)})
        (.*) (?{DEBUG_CAPTURE(3)}) 
         (.) (?{DEBUG_CAPTURE(4)}) 
     ) 
     (?{DEBUG_CAPTURE(5)}) (.)
     (?{DEBUG_CAPTURE(6)})  
   )
   (?{DEBUG_CAPTURE(7)})    /x;    
DEBUG_CAPTURE(8);

Output

$VAR1 = {'a' => 0,'capture' => []};
$VAR1 = {'a' => 1,'capture' => []};
$VAR1 = {'a' => 2,'capture' => []};
$VAR1 = {'a' => 3,'capture' => [undef,undef,'XY']};
$VAR1 = {'a' => 3,'capture' => [undef,undef,'X']};
$VAR1 = {'a' => 4,'capture' => [undef,undef,'X','Y']};
$VAR1 = {'a' => 5,'capture' => [undef,'XY','X','Y']};
$VAR1 = {'a' => 3,'capture' => [undef,'XY','','Y']};
$VAR1 = {'a' => 4,'capture' => [undef,'XY','','X']};
$VAR1 = {'a' => 5,'capture' => [undef,'X','','X']};
$VAR1 = {'a' => 6,'capture' => [undef,'X','','X','Y']};
$VAR1 = {'a' => 7,'capture' => ['XY','X','','X','Y']};
$VAR1 = {'a' => 8,'capture' => ['XY','X','','X','Y']};

The docs are correct if you are observing @{^CAPTURE} after a regexp has been completely evaluated. While evaluation is in process, @{^CAPTURE} seems to grow as the number of capture groups encountered increases. But it's not clear how useful it is to look at @{^CAPTURE} at least until you get to the end of the expression.

like image 45
mob Avatar answered Oct 08 '22 20:10

mob