Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Raku regex: How to use capturing group inside lookaheads

How can I use capturing groups inside lookahead assertion?

This code:

say "ab" ~~ m/(a) <?before (b) > /;

returns:

「a」
 0 => 「a」

But I was expecting to also capture 'b'.

Is there a way to do so?

I don't want to leave 'b' outside of the lookahead because I don't want 'b' to be part of the match.

Is there a way to capture 'b' but still leave it outside of the match?

NOTE:

I tried to use Raku's capture markers, as in:

say "ab" ~~ m/<((a))> (b) /;

「a」
 0 => 「a」
 1 => 「b」

But this does not seem to work as I expect because even if 'b' is left ouside the match, the regex has processed 'b'. And I don't want to be processed too.

For example:

say 'abab' ~~ m:g/(a)<?before b>|b/;

(「a」
    0 => 「a」
 「b」 
 「a」
    0 => 「a」
 「b」)

# Four matches (what I want)
 

say 'abab' ~~ m:g/<((a))>b|b/;

(「a」
    0 => 「a」 
 「a」
    0 => 「a」)

# Two matches
like image 479
Julio Avatar asked Nov 18 '20 17:11

Julio


People also ask

How do I reference a capture group in regex?

If your regular expression has named capturing groups, then you should use named backreferences to them in the replacement text. The regex (?' name'group) has one group called “name”. You can reference this group with ${name} in the JGsoft applications, Delphi, .

How does regex grouping work?

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.

How to use positive lookahead regex?

The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign. You can use any regular expression inside the lookahead (but not lookbehind, as explained below). Any valid regular expression can be used inside the lookahead.

What is lookbehind regex?

Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word. Negative Lookbehind is syntaxed like (?


1 Answers

Is there a way to do so?

Not really, but sort of. Three things conspire against us in trying to make this happen.

  1. Raku regex captures form trees of matches. Thus (a(b)) results in one positional capture that contains another positional capture. Why do I mention this? Because the same thing is going on with things like before, which take a regex as an argument: the regex passed to before gets its own Match object.
  2. The ? implies "do not capture". We may think of dropping it to get <before (b)>, and indeed there is a before key in the Match object now, which sounds promising except...
  3. before doesn't actually return what it matched on the inside, but instead a zero-width Match object, otherwise if we did forget the ? we'd end up with it not being a lookahead.

If only we could rescue the Match object from inside of the lookahead. Well, we can! We can declare a variable and then bind the $/ inside of the before argument regex into it:

say "ab" ~~ m/(a) :my $lookahead; <?before b {$lookahead = $/}> /;
say $lookahead;

Which gives:

「a」
 0 => 「a」
「b」

Which works, although it's unfortunately not attached like a normal capture. There's not a way to do that, although we can attach it via make:

say "ab" ~~ m/(a) :my $lookahead; <?before (b) {$lookahead = $0}> { make $lookahead } /;
say $/.made;

With the same output, except now it will be reliably attached to each match object coming back from m:g, and so will be robust, even if not beautiful.

like image 64
Jonathan Worthington Avatar answered Sep 26 '22 23:09

Jonathan Worthington