I'm trying to write a Raku grammar that can parse commands that ask for programming puzzles.
This is a simplified version just for my question, but the commands combine a difficulty level with an optional list of languages.
Sample valid input:
easy
hard javascript
medium javascript python raku
I can get it to match one language, but not multiple languages. I'm not sure where to add the :g
.
Here's an example of what I have so far:
grammar Command {
rule TOP { <difficulty> <languages>? }
token difficulty { 'easy' | 'medium' | 'hard' }
rule languages { <language>+ }
token language { \w+ }
}
multi sub MAIN(Bool :$test) {
use Test;
plan 5;
# These first 3 pass.
ok Command.parse('hard', :token<difficulty>), '<difficulty> can parse a difficulty';
nok Command.parse('no', :token<difficulty>), '<difficulty> should not parse random words';
# Why does this parse <languages>, but <language> fails below?
ok Command.parse('js', :rule<languages>), '<languages> can parse a language';
# These last 2 fail.
ok Command.parse('js', :token<language>), '<language> can parse a language';
# Why does this not match both words? Can I use :g somewhere?
ok Command.parse('js python', :rule<languages>), '<languages> can parse multiple languages';
}
This works, even though my test #4 fails:
my token wrd { \w+ }
'js' ~~ &wrd; #=> 「js」
Extracting multiple languages works with a regex using this syntax, but I'm not sure how to use that in a grammar:
'js python' ~~ m:g/ \w+ /; #=> (「js」 「python」)
Also, is there an ideal way to make the order unimportant so that difficulty
could come anywhere in the string? Example:
rule TOP { <languages>* <difficulty> <languages>? }
Ideally, I'd like for anything that is not a difficulty
to be read as a language
. Example: raku python medium js
should read medium
as a difficulty
and the rest as language
s.
There are two things at issue here.
To specify a subrule in a grammar parse, the named argument is always :rule
, regardless whether in the grammar it's a rule
, token
, method
, or regex
. Your first two tests are passing because they represent valid full-grammar parses (that is, TOP
), as the :token
named argument is ignored since it's unknown.
That gets us:
ok Command.parse('hard', :rule<difficulty>), '<difficulty> can parse a difficulty';
nok Command.parse('no', :rule<difficulty>), '<difficulty> should not parse random words';
ok Command.parse('js', :rule<languages> ), '<languages> can parse a language';
ok Command.parse('js', :rule<language> ), '<language> can parse a language';
ok Command.parse('js python', :rule<languages> ), '<languages> can parse multiple languages';
# Output
ok 1 - <difficulty> can parse a difficulty
ok 2 - <difficulty> should not parse random words
ok 3 - <languages> can parse a language
ok 4 - <language> can parse a language
not ok 5 - <languages> can parse multiple languages
The second issue is how implied whitespace is handled in a rule
. In a token
, the following are equivalent:
token foo { <alpha>+ }
token bar { <alpha> + }
But in a rule
, they would be different. Compare the token equivalents for the following rules:
rule foo { <alpha>+ }
token foo { <alpha>+ <.ws> }
rule bar { <alpha> + }
token bar { [<alpha> <.ws>] + }
In your case, you have <language>+
, and since language
is \w+
, it's impossible to match two (because the first one will consume all the \w
). Easy solution though, just change <language>+
to <language> +
.
To allow the <difficulty>
token to float around, the first solution that jumps to my mind is to match it and bail in a <language>
token:
token language { <!difficulty> \w+ }
<!foo>
will fail if at that position, it can match <foo>
. This will work almost perfect until you get a language like 'easyFoo'. The easy fix there is to ensure that the difficulty token always occurs at a word boundary:
token difficulty {
[
| easy
| medium
| hard
]
>>
}
where >>
asserts a word boundary on the right.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With