Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make Perl 6 grammar produce more than one match (like :ex and :ov)?

Tags:

raku

I want grammar to do something like this:

> "abc" ~~ m:ex/^ (\w ** 1..2) (\w ** 1..2) $ {say $0, $1}/
「ab」「c」
「a」「bc」

Or like this:

> my regex left { \S ** 1..2  }
> my regex right { \S ** 1..2  }
> "abc" ~~ m:ex/^ <left><right> $ {say $<left>, $<right>}/
「ab」「c」
「a」「bc」

Here is my grammar:

grammar LR {
  regex TOP {
    <left> 
    <right>
  }
  regex left {
    \w ** 1..2 
  }
  regex right {
    \w ** 1..2 
  }
}

my $string = "abc";
my $match = LR.parse($string);
say "input: $string";
printf "split: %s|%s\n", ~$match<left>, ~$match<right>;

Its output is:

$ input: abc
$ split: ab|c

So, <left> can be only greedy leaving nothing to <right>. How should I modify the code to match both possible variants?

$ input: abc
$ split: a|bc, ab|c
like image 406
Eugene Barsky Avatar asked Oct 23 '17 20:10

Eugene Barsky


2 Answers

Grammars are designed to give zero or one answers, not more than that, so you have to use some tricks to make them do what you want.

Since Grammar.parse returns just one Match object, you have to use a different approach to get all matches:

sub callback($match) {
    say $match;
}
grammar LR {
    regex TOP {
        <left> 
        <right>
        $
        { callback($/) }
        # make the match fail, thus forcing backtracking:
        <!>
    }
    regex left {
        \w ** 1..2 
    }
    regex right {
        \w ** 1..2 
    }
}

LR.parse('abc');

Making the match fail by calling the <!> assertion (which always fails) forces the previous atoms to backtrack, and thus finding different solutions. Of course this makes the grammar less reusable, because it works outside the regular calling conventions for grammars.

Note that, for the caller, the LR.parse seems to always fail; you get all the matches as calls to the callback function.

A slightly nicer API (but the same approach underneath) is to use gather/take to get a sequence of all matches:

grammar LR {
    regex TOP {
        <left> 
        <right>
        $
        { take $/ }
        # make the match fail, thus forcing backtracking:
        <!>
    }
    regex left {
        \w ** 1..2 
    }
    regex right {
        \w ** 1..2 
    }
}

.say for gather LR.parse('abc');
like image 154
moritz Avatar answered Oct 09 '22 23:10

moritz


I think Moritz Lenz, nickname moritz, author of the upcoming book "Parsing with Perl 6 Regexes and Grammars", is the person to ask about this. I probably should have just asked him to answer this SO...

Notes

In case anyone considers attempting to modify grammar.parse so that it supports :exhaustive, or otherwise hacking things to do what @evb wants, the following documents potentially useful inspiration/guidance that I gleaned from spelunking the relevant speculations document (S05) and searching the #perl6 and #perl6-dev irc logs.

7 years ago Moritz added an edit of S05:

A [regex] modifier that affects only the calling behaviour, and not the regex itself [eg :exhaustive] may only appear on constructs that involve a call (like m// [or grammar.parse]), and not on rx// [or regex { ... }].

(The [eg :exhaustive], [or grammar.parse], and [or regex { ... }] bits are extrapolations/interpretations/speculations I've added in this SO answer. They're not in the linked source.)

5 years ago Moritz expressed interest in implementing :exhaustive for matching (not parsing) features. Less than 2 minutes later jnthn showed a one liner that demo'd how he guessed he'd approach it. Less than 30 minutes later Moritz posted a working prototype. The final version landed 7 days later.

1 year ago Moritz said on #perl6 (emphasis added by me): "regexes and grammars aren't a good tool to find all possible ways to parse a string".

Hth.

like image 21
raiph Avatar answered Oct 09 '22 22:10

raiph