Why do @+ and @{^CAPTURE} differ in length?

Question

I'm trying to understand how the regex variables work, so I can save submatch positions in the payload within embedded code expressions. According to perlvar, the positive indices of the array correspond to $1, $2, $3, etc., but that doesn't seem to be the case?

#!/usr/bin/perl -w
use v5.28;
use Data::Dumper;

"XY" =~ / ( (.*) (.) (?{ 
    say Dumper { match_end => \@+ };
    say Dumper { capture => \@{^CAPTURE} }
}) ) (.)/x;

Output:

$VAR1 = {
          'match_end' => [
                           2,
                           undef,
                           1,
                           2,
                           undef
                         ]
        };

$VAR1 = {
          'capture' => [
                         undef,
                         'X',
                         'Y'
                       ]
        };

$VAR1 = {
          'match_end' => [
                           1,
                           2,
                           0,
                           1,
                           undef
                         ]
        };

$VAR1 = {
          'capture' => [
                         'XY',
                         '',
                         'X'
                       ]
        };

zdim · Accepted Answer

The @+ array apparently gets allocated, or otherwise prepared, already at compilation

perl -MData::Dump=dd -we'$_=q(abc); / (?{dd @+})  ( (.) )/x'

prints

(0, undef, undef)

(0 for the whole match and an undef for each indicated capture group), while

perl -MData::Dump=dd -we'$_=q(abc); / (?{dd @+})  ( (.) (.) )/x'

prints

(0, undef, undef, undef)

with one more element for one more capture group.

One the other hand, the @{^CAPTURE} is just plain empty until there are actual patterns to capture, as we can see from mob's detailed analysis. This, I'd say, plays well with its name.

After the fact the arrays agree, with that shift of one in indices since @+ also contains (offset for) the whole match, at $+[0].

Another difference is that a trailing failed optional match doesn't get a slot in @{^CAPTURE}

perl -MData::Dump=dd -we'$_=q(abc); /((x)? (.) (x)?)/x; dd @+; dd @{^CAPTURE}'

prints

(1, 1, undef, 1, undef)
("a", undef, "a")

mob · Answer

The perlvar docs are unclear about what @{^CAPTURE} look like in the middle of a regexp evaluation, but there is a clear progression that depends where in the regexp you are looking at it.

use 5.026;
use Data::Dumper; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 0;

sub DEBUG_CAPTURE { say Dumper { a => $_[0], capture => \@{^CAPTURE} }; }

"XY" =~ /
   (?{DEBUG_CAPTURE(0)})
   (
     (?{DEBUG_CAPTURE(1)}) 
     (
             (?{DEBUG_CAPTURE(2)})
        (.*) (?{DEBUG_CAPTURE(3)}) 
         (.) (?{DEBUG_CAPTURE(4)}) 
     ) 
     (?{DEBUG_CAPTURE(5)}) (.)
     (?{DEBUG_CAPTURE(6)})  
   )
   (?{DEBUG_CAPTURE(7)})    /x;    
DEBUG_CAPTURE(8);

Output

$VAR1 = {'a' => 0,'capture' => []};
$VAR1 = {'a' => 1,'capture' => []};
$VAR1 = {'a' => 2,'capture' => []};
$VAR1 = {'a' => 3,'capture' => [undef,undef,'XY']};
$VAR1 = {'a' => 3,'capture' => [undef,undef,'X']};
$VAR1 = {'a' => 4,'capture' => [undef,undef,'X','Y']};
$VAR1 = {'a' => 5,'capture' => [undef,'XY','X','Y']};
$VAR1 = {'a' => 3,'capture' => [undef,'XY','','Y']};
$VAR1 = {'a' => 4,'capture' => [undef,'XY','','X']};
$VAR1 = {'a' => 5,'capture' => [undef,'X','','X']};
$VAR1 = {'a' => 6,'capture' => [undef,'X','','X','Y']};
$VAR1 = {'a' => 7,'capture' => ['XY','X','','X','Y']};
$VAR1 = {'a' => 8,'capture' => ['XY','X','','X','Y']};

The docs are correct if you are observing @{^CAPTURE} after a regexp has been completely evaluated. While evaluation is in process, @{^CAPTURE} seems to grow as the number of capture groups encountered increases. But it's not clear how useful it is to look at @{^CAPTURE} at least until you get to the end of the expression.

Why do @+ and @{^CAPTURE} differ in length?

Tags:

regex

perl

rubystallion

2 Answers

zdim

Output

mob

Recent Activity

Donate For Us

Why do @+ and @{^CAPTURE} differ in length?

Tags:

regex

perl

rubystallion

2 Answers

zdim

Output

mob

Related questions

Recent Activity

Donate For Us