In the process of writing a translator of one music language to another (ABC to Alda) as an excuse to learn Raku DSL-ability, I noticed that there doesn't seem to be a way to terminate a .parse
! Here is my shortened demo code:
#!/home/hsmyers/rakudo741/bin/perl6
use v6d;
# use Grammar::Debugger;
use Grammar::Tracer;
my $test-n01 = q:to/EOS/;
a b c d e f g
A B C D E F G
EOS
grammar test {
token TOP { <score>+ }
token score {
<.ws>?
[
| <uc>
| <lc>
]+
<.ws>?
}
token uc { <[A..G]> }
token lc { <[a..g]> }
}
test.parse($test-n01).say;
And it is the last part of the Grammer::Tracer display that demonstrates my problem.
| score
| | uc
| | * MATCH "G"
| * MATCH "G\n"
| score
| * FAIL
* MATCH "a b c d e f g\nA B C D E F G\n"
「a b c d e f g
A B C D E F G
」
On the second to last line, the word FAIL tells me that the .parse run has no way of quitting. I wonder if this is correct? The .say displays everything as it should be, so I'm not clear on how real the FAIL is? The question remains, "How do I correctly write a grammar that parses multiple lines without error?"
When you use the grammar debugger, it lets you see exactly how the engine is parsing the string — fails are normal and expected. Considered, for example, matching a+b*
with the string aab
. You should get two matches for 'a', followed by a fail (because b
is not a
) but then it will retry with b
and successfully match.
This might be more easily seen if you do an alternation with ||
(which enforces order). If you have
token TOP { I have a <fruit> }
token fruit { apple || orange || kiwi }
and you parse the sentence "I have a kiwi", you'll see it first match "I have a", followed by two fails with "apple" and "orange", and finally a match with "kiwi".
Now let's look at your case:
TOP # Trying to match top (need >1 match of score)
| score # Trying to match score (need >1 match of lc/uc)
| | lc # Trying to match lc
| | * MATCH "a" # lc had a successful match! ("a")
| * MATCH "a " # and as a result so did score! ("a ")
| score # Trying to match score again (because <score>+)
| | lc # Trying to match lc
| | * MATCH "b" # lc had a successful match! ("b")
| * MATCH "b " # and as a result so did score! ("b ")
…………… # …so forth and so on until…
| score # Trying to match score again (because <score>+)
| | uc # Trying to match uc
| | * MATCH "G" # uc had a successful match! ("G")
| * MATCH "G\n" # and as a result, so did score! ("G\n")
| score # Trying to match *score* again (because <score>+)
| * FAIL # failed to match score, because no lc/uc.
|
| # <-------------- At this point, the question is, did TOP match?
| # Remember, TOP is <score>+, so we match TOP if there
| # was at least one <score> token that matched, there was so...
|
* MATCH "a b c d e f g\nA B C D E F G\n" # this is the TOP match
The fail here is normal: at some point we will run out of <score>
tokens, so a fail is inevitable. When that happens, the grammar engine can move on to whatever comes after the <score>+
in your grammar. Since there's nothing, that fail actually results in a match of the entire string (because TOP
matches with implicit /^…$/
).
Also, you might consider rewriting your grammar with a rule which inserts <.ws>* automatically (unless it's important for it to be a single space only):
grammar test {
rule TOP { <score>+ }
token score {
[
| <uc>
| <lc>
]+
}
token uc { <[A..G]> }
token lc { <[a..g]> }
}
Further, IME, you might want to also want to add a proto token for the uc/lc, because when you have [ <foo> | <bar> ]
you will always have one of them be undefined which can make processing them in an actions class a bit annoying. You could try:
grammar test {
rule TOP { <score> + }
token score { <letter> + }
proto token letter { * }
token letter:uc { <[A..G]> }
token letter:lc { <[a..g]> }
}
$<letter>
will always be defined this way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With