Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I access the optional parts of a grammar in perl6?

Tags:

raku

As part of my grammar I have:

        rule EX1        { <EX2> ( '/' <EX2>)*  }

In my actions class I have written:

    method EX1($/) {
            my @ex2s = map *.made,  $/.<EX2>;
            my $ex1 = @ex2s.join('|');
            #say "EX1 making $ex1";
            $/.make($ex1);
    }

So basically I am just trying to join all the EX2's together with a '|' between them instead of a '/'. However something is not right with my code, as it only picks up the first EX2, not the subsequent ones. How do I find out what the optional ones are?

like image 209
blippy Avatar asked Jul 11 '19 20:07

blippy


2 Answers

TL;DR Your action method would work if your rule created the data structure the method is expecting. So we'll fix the rule and leave the method alone.

The main problem

Let's assume the EX1 rule is slotted into a working grammar; a string has been successfully parsed; the substring ex2/ex2/ex2 matched the EX1 rule; and we've displayed the corresponding part of the parse tree (by just saying the results of .parse using the grammar):

EX1 => 「ex2/ex2/ex2」
 EX2 => 「ex2」
 0 => 「/ex2」
  EX2 => 「ex2」
 0 => 「/ex2」
  EX2 => 「ex2」

Note the extraneous 0 => captures and how the second and third EX2s are indented under them and indented relative to the first EX2. That's the wrong nesting structure relative to your method's assumptions.

Brad's solution to the main problem

As Brad++ points out in their comment responding to the first version of this answer, you can simply switch from the construct that both groups and captures ((...)) to the one that only groups ([...]).

    rule EX1        { <EX2> [ '/' <EX2>]*  }

Now the corresponding parse tree fragment for the same input string as above is:

EX1 => 「ex2/ex2/ex2」
 EX2 => 「ex2」
 EX2 => 「ex2」
 EX2 => 「ex2」

The 0 captures are gone and the EX2s are now all siblings. For further discussion of when and why P6 nests captures the way it does, see jnthn's answer to Why/how ... capture groups?.

Your action method should now work -- for some inputs...

Håkon's solution to another likely problem

If Brad's solution works for some of the inputs you'd expect it to work for, but not all, part of the problem is likely how your rule matches between <EX2> and the / character.

As Håkon++ points out in their answer, your rule has spacing that probably doesn't do what you want.

If you don't intend the spacing in your pattern to be significant, then don't use a rule. In a token or regex all spaces in a pattern (ignoring inside a string eg ' ') is just to make your pattern more readable and isn't meaningful relative to any input string being matched. If in doubt, use a token (or regex) not a rule:

token EX1 { <EX2> ( '/' <EX2>)* }
           🡅    🡅 🡅   🡅      🡅  🡅

Spacing indicated with 🡅 is NOT significant. You could omit it or extend it and it'll make no difference to how the rule matches input. It's only for readability.

In contrast, the entire point of the rule construct is that whitespace after each atom and each quantifier in a pattern is significant. Such spacing implicitly applies a (user overridable) boundary matching rule (by default a rule that allows whitespace and/or a transition between "word" and non-"word" characters) after the corresponding substring in the input.

In your EX1 rule, which I repeat below with exaggerated spacing to ensure clarity, some of the spacing is not significant, just as it isn't in a token or regex:

     rule EX1        {  <EX2>   (  '/'  <EX2>)*   }
                      🡅          🡅                 🡅

As before 🡅 indicates spacing that is NOT significant -- you can omit or extend it and it'll make no difference. The thing to remember is that spaces at the start of a pattern (or sub-pattern) is just for readability. (Experience from use showed that it was much better if any spacing there is not treated as significant.)

But spacing or lack of spacing after an atom or quantifier is significant:

This spacing is significant: ⮟      ⮟        ⮟
     rule EX1        { <EX2>   ( '/'  <EX2>)*   }
This LACK of spacing is significant:      ⮝⮝

By writing your rule as you did you're telling P6 to match input with boundary matching (which by default allows whitespace) only:

  • after the first <EX2> (and thus before the first /);

  • between / and subsequent <EX2> matches;

  • after the last <EX2> match.

So your rule tells P6 to allow spaces between a / and <EX2> match when they occur in that order -- /, then <EX2>.

But it also tells P6 to not allow spaces the other way around -- between an <EX2> match and a / match in that order! Except with the very first <EX2> '/' pair!! P6 will let you declare match patterns of arbitrary complexity, including spacing, but I doubt this is what you meant or want.

For a complete listing of what "after an atom" means (i.e. when whitespace in rules is significant) see When is white space really important in Perl6 grammars?.

This significant spacing feature is:

  • Classic Perl DWIMery designed to make life easier;

  • Idiomatic -- used in most grammars because it does indeed make life easier;

  • The only reason the rule declarator exists (this significant whitespace aspect is the only difference between a rule and a token);

  • Completely optional because you can just use a token instead.

If someone reading this thinks they'd rather not take advantage of this significant space feature, then they can just use tokens instead. (This in turn will likely lead them to see why rule exists as an option, and then, or perhaps later, to see why it works the way it does, and to appreciate its DWIMery anew. :) )

The built in construct for the pattern you're matching

Finally, here's the idiomatic way to write the pattern you're trying to match:

rule EX1        { <EX2> + % '/' }

This tells P6 to match one or more <EX2>s separated by / characters. See Modified quantifier: %, %% for an explanation of this nice construct.

This is still a rule so most of the spacing in it remains significant. The precise details for when it is and isn't are at their most apparently fiddly for this construct because it has up to three significant spacers and one that's not:

NOT significant:  ⮟                 ⮟
     rule EX1   {   <EX2>    +    %    '/'   }
Significant:              ⮝    ⮝          ⮝

Including spacing both before and after the + is redundant:

     rule EX1   {   <EX2>    +    %    '/'   }
     rule EX1   {   <EX2>    +%        '/'   } # same match result
     rule EX1   {   <EX2>+        %    '/'   } # same match result
like image 93
raiph Avatar answered Nov 03 '22 09:11

raiph


White space is significant in rules. So I think you are missing a whitespace after the last <EX2>:

rule EX1 { <EX2> ( '/' <EX2>)+  }

It should be:

rule EX1 { <EX2> ( '/' <EX2> )+  }

This allows for space to separate the terms in EX1.

like image 27
Håkon Hægland Avatar answered Nov 03 '22 09:11

Håkon Hægland