Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to have a capture within an interpolated regex?

I wanted to generate regex from an existing list of values, but when I attempted to use a capture within it, the capture was not present in the match. Is it not possible to have a capture using interpolation, or am I doing something wrong?

my @keys = <foo bar baz>;
my $test-pattern = @keys.map({ "<$_>" }).join(' || ');

grammar Demo1 {
  token TOP {
    [
      || <foo>
      || <bar>
      || <baz>
    ] ** 1..* % \s+
  }

  token foo { 1 }
  token bar { 2 }
  token baz { 3 }
}

grammar Demo2 {
  token TOP {
    [ <$test-pattern> ] ** 1..* % \s+
  }

  token foo { 1 }
  token bar { 2 }
  token baz { 3 }
}

say $test-pattern, "\n" x 2, Demo1.parse('1 2 3'), "\n" x 2, Demo2.parse('1 2 3');
<foo> || <bar> || <baz>

「1 2 3」
 foo => 「1」
 bar => 「2」
 baz => 「3」

「1 2 3」
like image 806
Daniel Mita Avatar asked Dec 04 '20 13:12

Daniel Mita


People also ask

What is capturing group in regex?

Capturing group. (regex) Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. (abc){3} matches abcabcabc. First group matches abc.

What are non-capturing parentheses in regex?

Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything. (?:abc){3} matches abcabcabc. No groups. Substituted with the text matched between the 1st through 9th numbered capturing group.

What happens to non-matching groups in regex?

In these cases, non-matching groups simply won't contain any information. If a quantifier is placed behind a group, like in (qux)+ above, the overall group count of the expression stays the same. If a group matches more than once, its content will be the last match occurrence. However, modern regex flavors allow accessing all sub-match occurrences.

What happens if you put a quantifier behind a group in regex?

If a quantifier is placed behind a group, like in (qux)+ above, the overall group count of the expression stays the same. If a group matches more than once, its content will be the last match occurrence. However, modern regex flavors allow accessing all sub-match occurrences.


Video Answer


1 Answers

The rule for determining whether an atom of the form <...> captures without further ado is whether or not it starts with a letter or underscore.

If an assertion starts with a letter or underscore, then an identifier is expected/parsed and a match is captured using that identifier as the key in the enclosing match object. For example, <foo::baz-bar qux> begins with a letter and captures under the key foo::baz-bar.

If an assertion does not begin with a letter or underscore, then by default it does not capture.


To capture the results of an assertion whose first character is not a letter or underscore you can either put it in parens or name it:

( <$test-pattern> ) ** 1..* % \s+

or, to name the assertion:

<test-pattern=$test-pattern> ** 1..* % \s+

or (just another way to have the same naming effect):

$<test-pattern>=<$test-pattern> ** 1..* % \s+

If all you do is put an otherwise non-capturing assertion in parens, then you have not switched capturing on for that assertion. Instead, you've merely wrapped it in an outer capture. The assertion remains non-capturing, and any sub-capture data of the non-capturing assertion is thrown away.

Thus the output of the first solution shown above (wrapping the <$test-pattern> assertion in parens) is:

「1 2 3」
 0 => 「1」
 0 => 「2」
 0 => 「3」

Sometimes that's what you'll want to simplify the parse tree and/or save memory.

In contrast, if you name an otherwise non-capturing assertion with either of the named forms shown above, then by doing so you convert it into a capturing assertion, which means any sub capture detail will be retained. Thus the named solutions produce:

「1 2 3」
 test-pattern => 「1」
  foo => 「1」
 test-pattern => 「2」
  bar => 「2」
 test-pattern => 「3」
  baz => 「3」
like image 131
raiph Avatar answered Nov 15 '22 16:11

raiph