Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I match using :global in Raku grammar?

Tags:

raku

I'm trying to write a Raku grammar that can parse commands that ask for programming puzzles.

This is a simplified version just for my question, but the commands combine a difficulty level with an optional list of languages.

Sample valid input:

  • No language: easy
  • One language: hard javascript
  • Multiple languages: medium javascript python raku

I can get it to match one language, but not multiple languages. I'm not sure where to add the :g.

Here's an example of what I have so far:

grammar Command {
    rule TOP { <difficulty> <languages>? }

    token difficulty { 'easy' | 'medium' | 'hard' }

    rule languages { <language>+ }
    token language { \w+ }
}

multi sub MAIN(Bool :$test) {
    use Test;
    plan 5;

    # These first 3 pass.
    ok Command.parse('hard', :token<difficulty>), '<difficulty> can parse a difficulty';

    nok Command.parse('no', :token<difficulty>), '<difficulty> should not parse random words';

    # Why does this parse <languages>, but <language> fails below?
    ok Command.parse('js', :rule<languages>), '<languages> can parse a language';

    # These last 2 fail.
    ok Command.parse('js', :token<language>), '<language> can parse a language';

    # Why does this not match both words? Can I use :g somewhere?
    ok Command.parse('js python', :rule<languages>), '<languages> can parse multiple languages';
}

This works, even though my test #4 fails:

my token wrd { \w+ }
'js' ~~ &wrd;  #=> 「js」

Extracting multiple languages works with a regex using this syntax, but I'm not sure how to use that in a grammar:

'js python' ~~ m:g/ \w+ /;  #=> (「js」 「python」)

Also, is there an ideal way to make the order unimportant so that difficulty could come anywhere in the string? Example:

rule TOP { <languages>* <difficulty> <languages>? }

Ideally, I'd like for anything that is not a difficulty to be read as a language. Example: raku python medium js should read medium as a difficulty and the rest as languages.

like image 996
R891 Avatar asked Dec 05 '20 06:12

R891


1 Answers

There are two things at issue here.

To specify a subrule in a grammar parse, the named argument is always :rule, regardless whether in the grammar it's a rule, token, method, or regex. Your first two tests are passing because they represent valid full-grammar parses (that is, TOP), as the :token named argument is ignored since it's unknown.

That gets us:

ok  Command.parse('hard',      :rule<difficulty>), '<difficulty> can parse a difficulty';
nok Command.parse('no',        :rule<difficulty>), '<difficulty> should not parse random words';
ok  Command.parse('js',        :rule<languages> ), '<languages> can parse a language';
ok  Command.parse('js',        :rule<language>  ), '<language> can parse a language';
ok  Command.parse('js python', :rule<languages> ), '<languages> can parse multiple languages';

# Output
ok 1 - <difficulty> can parse a difficulty
ok 2 - <difficulty> should not parse random words
ok 3 - <languages> can parse a language
ok 4 - <language> can parse a language
not ok 5 - <languages> can parse multiple languages

The second issue is how implied whitespace is handled in a rule. In a token, the following are equivalent:

token foo { <alpha>+  }
token bar { <alpha> + }

But in a rule, they would be different. Compare the token equivalents for the following rules:

rule  foo { <alpha>+       } 
token foo { <alpha>+ <.ws> }

rule  bar { <alpha> +         }
token bar { [<alpha> <.ws>] + }

In your case, you have <language>+, and since language is \w+, it's impossible to match two (because the first one will consume all the \w). Easy solution though, just change <language>+ to <language> +.

To allow the <difficulty> token to float around, the first solution that jumps to my mind is to match it and bail in a <language> token:

token language { <!difficulty> \w+ }

<!foo> will fail if at that position, it can match <foo>. This will work almost perfect until you get a language like 'easyFoo'. The easy fix there is to ensure that the difficulty token always occurs at a word boundary:

token difficulty {
   [
   | easy
   | medium
   | hard
   ]
   >> 
}

where >> asserts a word boundary on the right.

like image 95
user0721090601 Avatar answered Nov 15 '22 17:11

user0721090601