Do recursive regexes understand named captures? There is a note in the docs for (?{{ code }})
that it's an independent subpattern with its own set of captures that are discarded when the subpattern is done, and there's a note in (?PARNO)
that its "similar to (?{{ code }})
. Is (?PARNO)
discarding its own named captures when it's done?
I'm writing about Perl's recursive regular expressions for Mastering Perl. perlre already has an example with balanced parens (I show it in Matching balanced parenthesis in Perl regex), so I thought I'd try balanced quote marks:
#!/usr/bin/perl # quotes-nested.pl use v5.10; $_ =<<'HERE'; He said 'Amelia said "I am a camel"' HERE say "Matched!" if m/ ( ['"] ( (?: [^'"]+ | ( (?1) ) )* ) ['"] ) /xg; print " 1 => $1 2 => $2 3 => $3 4 => $4 5 => $5 ";
This works and the two quotes show up in $1
and $3
:
Matched! 1 => 'Amelia said "I am a camel"' 2 => Amelia said "I am a camel" 3 => "I am a camel" 4 => 5 =>
That's fine. I understand that. However, I don't want to know the numbers. So, I make the first capture group a named capture and look in %-
expecting to see the two substrings I previously saw in $1
and $2
:
use v5.10; $_ =<<'HERE'; He said 'Amelia said "I am a camel"' HERE say "Matched [$+{said}]!" if m/ (?<said> ['"] ( (?: [^'"]+ | (?1) )* ) ['"] ) /xg; use Data::Dumper; print Dumper( \%- );
I only see the first:
Matched ['Amelia said "I am a camel"']! $VAR1 = { 'said' => [ '\'Amelia said "I am a camel"\'' ] };
I expected that (?1)
would repeat everything in the first capture group, including the named capture to said
. I can fix that a bit by naming a new capture:
use v5.10; $_ =<<'HERE'; He said 'Amelia said "I am a camel"' HERE say "Matched [$+{said}]!" if m/ (?<said> ['"] ( (?: [^'"]+ | (?<said> (?1) ) )* ) ['"] ) /xg; use Data::Dumper; print Dumper( \%- );
Now I get what I expected:
Matched ['Amelia said "I am a camel"']! $VAR1 = { 'said' => [ '\'Amelia said "I am a camel"\'', '"I am a camel"' ] };
I thought that I could fix this by moving the named capture up one level:
use v5.10; $_ =<<'HERE'; He said 'Amelia said "I am a camel"' HERE say "Matched [$+{said}]!" if m/ ( (?<said> ['"] ( (?: [^'"]+ | (?1) )* ) ['"] ) ) /xg; use Data::Dumper; print Dumper( \%- );
But, this doesn't catch the smaller substring in said
either:
Matched ['Amelia said "I am a camel"']! $VAR1 = { 'said' => [ '\'Amelia said "I am a camel"\'' ] };
I think I understand this, but I also know that there are people here who actually touch the C code that makes it happen. :)
And, as I write this, I think I should overload the STORE tie for %-
to find out, but then I'd have to find out how to do that.
When there’s a success of matches against the enclosing pattern, Perl updates the magical variable ‘ %+ ‘. This hash contains the name of the capture as the key and the portion of the string that matched the capture as the value of hash. Named captures often improve regex maintainability.
Perl postulates those matches into special variables for each set of capturing parentheses which are $1, $2, $3. The captures which allow us to capture portions of matches from applying regular expressions and being able to use them later are known as Named Captures. For example: Extracting a phone number from a contact information.
Even though they are possible in Perl, but they are not used very frequently. They are used only in top-level regexes. Numbered captures neither provide any identifying name nor does anything to %+. Instead in Perl, the captured string is stored inside a series of magical variables.
Instead in Perl, the captured string is stored inside a series of magical variables. The first matching capture is stored into $1, the second one in $2, and so on.
After playing around with this, I'm satisfied that what I said in the question is right. Each call to (?PARNO)
gets a complete and separate set of the match variables that it discards at the end of its run.
You can get all the things that matched in each sub pattern by using an array external to the pattern match operator and pushing onto it at the end of the repeated sub pattern, like in this example:
#!/usr/bin/perl # nested_carat_n.pl use v5.10; $_ =<<'HERE'; Outside "Top Level 'Middle Level "Bottom Level" Middle' Outside" HERE my @matches; say "Matched!" if m/ (?(DEFINE) (?<QUOTE_MARK> ['"]) (?<NOT_QUOTE_MARK> [^'"]) ) ( (?<quote>(?"E_MARK)) (?: (?&NOT_QUOTE_MARK)++ | (?R) )* \g{quote} ) (?{ push @matches, $^N }) /x; say join "\n", @matches;
I go through it in depth in Chapter 2 of Mastering Perl, which you can read for free (at least for awhile).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With