Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

Tags:

regex

raku

I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.

After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':

#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.){} :my $c=$0; ($c)**2..*)/ && print $0';

#Output:  aaaaa

To aid in illustrating my question only, a similar regex in perl5:

#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)\2{2,})/ && print $1';

Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty {}? Is there an alternative (better/golfed) perl6 regex that will match?

Thanks in advance.

like image 732
drclaw Avatar asked May 31 '19 11:05

drclaw


1 Answers

Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.

This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:

say "abab" ~~ /((a)(b))+/

Then the result is:

「abab」
 0 => 「ab」
  0 => 「a」
  1 => 「b」
 0 => 「ab」
  0 => 「a」
  1 => 「b」

And we can then index:

say $0;        # The array of the top-level capture, which was quantified
say $0[1];     # The second Match
say $0[1][0];  # The first Match within that Match object (the (a))

It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.

like image 153
Jonathan Worthington Avatar answered Nov 15 '22 22:11

Jonathan Worthington