I'm trying to understand how the regex variables work, so I can save submatch positions in the payload within embedded code expressions. According to perlvar
, the positive indices of the array correspond to $1, $2, $3, etc., but that doesn't seem to be the case?
#!/usr/bin/perl -w
use v5.28;
use Data::Dumper;
"XY" =~ / ( (.*) (.) (?{
say Dumper { match_end => \@+ };
say Dumper { capture => \@{^CAPTURE} }
}) ) (.)/x;
Output:
$VAR1 = {
'match_end' => [
2,
undef,
1,
2,
undef
]
};
$VAR1 = {
'capture' => [
undef,
'X',
'Y'
]
};
$VAR1 = {
'match_end' => [
1,
2,
0,
1,
undef
]
};
$VAR1 = {
'capture' => [
'XY',
'',
'X'
]
};
The @+
array apparently gets allocated, or otherwise prepared, already at compilation
perl -MData::Dump=dd -we'$_=q(abc); / (?{dd @+}) ( (.) )/x'
prints
(0, undef, undef)
(0 for the whole match and an undef
for each indicated capture group), while
perl -MData::Dump=dd -we'$_=q(abc); / (?{dd @+}) ( (.) (.) )/x'
prints
(0, undef, undef, undef)
with one more element for one more capture group.
One the other hand, the @{^CAPTURE}
is just plain empty until there are actual patterns to capture, as we can see from mob's detailed analysis. This, I'd say, plays well with its name.
After the fact the arrays agree, with that shift of one in indices since @+
also contains (offset for) the whole match, at $+[0]
.
Another difference is that a trailing failed optional match doesn't get a slot in @{^CAPTURE}
perl -MData::Dump=dd -we'$_=q(abc); /((x)? (.) (x)?)/x; dd @+; dd @{^CAPTURE}'
prints
(1, 1, undef, 1, undef) ("a", undef, "a")
The perlvar docs are unclear about what @{^CAPTURE}
look like in the middle of a regexp evaluation, but there is a clear progression that depends where in the regexp you are looking at it.
use 5.026;
use Data::Dumper; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 0;
sub DEBUG_CAPTURE { say Dumper { a => $_[0], capture => \@{^CAPTURE} }; }
"XY" =~ /
(?{DEBUG_CAPTURE(0)})
(
(?{DEBUG_CAPTURE(1)})
(
(?{DEBUG_CAPTURE(2)})
(.*) (?{DEBUG_CAPTURE(3)})
(.) (?{DEBUG_CAPTURE(4)})
)
(?{DEBUG_CAPTURE(5)}) (.)
(?{DEBUG_CAPTURE(6)})
)
(?{DEBUG_CAPTURE(7)}) /x;
DEBUG_CAPTURE(8);
$VAR1 = {'a' => 0,'capture' => []};
$VAR1 = {'a' => 1,'capture' => []};
$VAR1 = {'a' => 2,'capture' => []};
$VAR1 = {'a' => 3,'capture' => [undef,undef,'XY']};
$VAR1 = {'a' => 3,'capture' => [undef,undef,'X']};
$VAR1 = {'a' => 4,'capture' => [undef,undef,'X','Y']};
$VAR1 = {'a' => 5,'capture' => [undef,'XY','X','Y']};
$VAR1 = {'a' => 3,'capture' => [undef,'XY','','Y']};
$VAR1 = {'a' => 4,'capture' => [undef,'XY','','X']};
$VAR1 = {'a' => 5,'capture' => [undef,'X','','X']};
$VAR1 = {'a' => 6,'capture' => [undef,'X','','X','Y']};
$VAR1 = {'a' => 7,'capture' => ['XY','X','','X','Y']};
$VAR1 = {'a' => 8,'capture' => ['XY','X','','X','Y']};
The docs are correct if you are observing @{^CAPTURE}
after a regexp has been completely evaluated. While evaluation is in process, @{^CAPTURE}
seems to grow as the number of capture groups encountered increases. But it's not clear how useful it is to look at @{^CAPTURE}
at least until you get to the end of the expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With