I want grammar
to do something like this:
> "abc" ~~ m:ex/^ (\w ** 1..2) (\w ** 1..2) $ {say $0, $1}/
「ab」「c」
「a」「bc」
Or like this:
> my regex left { \S ** 1..2 }
> my regex right { \S ** 1..2 }
> "abc" ~~ m:ex/^ <left><right> $ {say $<left>, $<right>}/
「ab」「c」
「a」「bc」
Here is my grammar
:
grammar LR {
regex TOP {
<left>
<right>
}
regex left {
\w ** 1..2
}
regex right {
\w ** 1..2
}
}
my $string = "abc";
my $match = LR.parse($string);
say "input: $string";
printf "split: %s|%s\n", ~$match<left>, ~$match<right>;
Its output is:
$ input: abc
$ split: ab|c
So, <left>
can be only greedy leaving nothing to <right>
. How should I modify the code to match both possible variants?
$ input: abc
$ split: a|bc, ab|c
Grammars are designed to give zero or one answers, not more than that, so you have to use some tricks to make them do what you want.
Since Grammar.parse
returns just one Match
object, you have to use a different approach to get all matches:
sub callback($match) {
say $match;
}
grammar LR {
regex TOP {
<left>
<right>
$
{ callback($/) }
# make the match fail, thus forcing backtracking:
<!>
}
regex left {
\w ** 1..2
}
regex right {
\w ** 1..2
}
}
LR.parse('abc');
Making the match fail by calling the <!>
assertion (which always fails) forces the previous atoms to backtrack, and thus finding different solutions. Of course this makes the grammar less reusable, because it works outside the regular calling conventions for grammars.
Note that, for the caller, the LR.parse
seems to always fail; you get all the matches as calls to the callback function.
A slightly nicer API (but the same approach underneath) is to use gather
/take
to get a sequence of all matches:
grammar LR {
regex TOP {
<left>
<right>
$
{ take $/ }
# make the match fail, thus forcing backtracking:
<!>
}
regex left {
\w ** 1..2
}
regex right {
\w ** 1..2
}
}
.say for gather LR.parse('abc');
I think Moritz Lenz, nickname moritz, author of the upcoming book "Parsing with Perl 6 Regexes and Grammars", is the person to ask about this. I probably should have just asked him to answer this SO...
In case anyone considers attempting to modify grammar.parse
so that it supports :exhaustive
, or otherwise hacking things to do what @evb wants, the following documents potentially useful inspiration/guidance that I gleaned from spelunking the relevant speculations document (S05) and searching the #perl6 and #perl6-dev irc logs.
7 years ago Moritz added an edit of S05:
A [regex] modifier that affects only the calling behaviour, and not the regex itself [eg
:exhaustive
] may only appear on constructs that involve a call (likem//
[orgrammar.parse
]), and not onrx//
[orregex { ... }
].
(The [eg :exhaustive
], [or grammar.parse
], and [or regex { ... }
] bits are extrapolations/interpretations/speculations I've added in this SO answer. They're not in the linked source.)
5 years ago Moritz expressed interest in implementing :exhaustive
for matching (not parsing) features. Less than 2 minutes later jnthn showed a one liner that demo'd how he guessed he'd approach it. Less than 30 minutes later Moritz posted a working prototype. The final version landed 7 days later.
1 year ago Moritz said on #perl6 (emphasis added by me): "regexes and grammars aren't a good tool to find all possible ways to parse a string".
Hth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With