Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using 'after' as lookbehind in a grammar in raku

I'm trying to do a match in a raku grammar and failing with 'after'. I've boiled down my problem to the following snippet:

grammar MyGrammar {

    token TOP {
        <character>
    }

    token character {
        <?after \n\n>LUKE
    }
}

say MyGrammar.subparse("\n\nLUKE");

This returns #<failed match> as MyGrammar.subparse and Nil as MyGrammar.parse.

But if I run a match in the REPL:

"\n\nLUKE" ~~ /<?after \n\n>LUKE/

I get the match 「LUKE」

So there's something I'm not understanding, and I'm not sure what. Any pointers?

like image 921
MorayJ Avatar asked Jul 01 '20 22:07

MorayJ


1 Answers

When we parse a string using a grammar, the matching is anchored to the start of the string. Parsing the input with parse requires us to consume all of the string. There is also a subparse, which allows us to not consume all of the input, but this is still anchored to the start of the string.

By contrast, a regex like /<?after \n\n>LUKE/ will scan through the string, trying to match the pattern at each position in the string, until it finds a position at which it matches (or gets to the end of the string and gives up). This is why it works. Note, however, that if your goal is to not capture the \n\n, then you could instead have written the regex as /\n\n <( LUKE/, where <( indicates where to start capturing. At least on the current Rakudo compiler implementation, this way is more efficient.

It's not easy to suggest how to write the grammar without a little more context (I'm guessing this is extracted from a larger problem). You could, for example, consume whitespace at the start of the grammar:

grammar MyGrammar {

    token TOP {
        \s+ <character>
    }

    token character {
        <?after \n\n>LUKE
    }
}

say MyGrammar.subparse("\n\nLUKE");

Or consume the \n\n in character but exclude it from the match with <(, as mentioned earlier.

like image 104
Jonathan Worthington Avatar answered Sep 21 '22 16:09

Jonathan Worthington