Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use matching delimiters in Raku

Tags:

grammar

raku

I'm trying to write a token that allows nested content with matching delimiters. Where (AB) should result in a match to at least "AB" if not "(AB)". And (A(c)B) would return two matches "(A(c)B)" and so on.

Code boiled down from its source:

#!/home/hsmyers/rakudo741/bin/perl6
use v6d;

my @tie;

class add-in {
    method tie($/) { @tie.push: $/; }
}

grammar tied {
    rule TOP { <line>* }
    token line {
        <.ws>?
        [
            | <tie>
            | <simpleNotes>
        ]+
        <.ws>?
    }
    token tie {
        [
            || <.ws>? <simpleNotes>+ <tie>* <simpleNotes>* <.ws>?
            || <openParen> ~ <closeParen> <tie>
        ]+
    }
    token openParen { '(' }
    token closeParen { ')' }
    token simpleNotes {
        [
            | <[A..Ga..g,'>0..9]>
            | <[|\]]>
            | <blank>
        ]
    }
}

my $text = "(c2D) | (aA) (A2 | B)>G A>F G>E (A,2 |\nD)>F A>c d>f |]";

tied.parse($text, actions => add-in.new).say;
$text.say;
for (@tie) {
    s:g/\v/\\n/;
    say "«$_»";
}

This gives a partially correct result of:

«c2D»
«aA»
«(aA)»
«A2 | B»
«\nD»
«A,2 |\nD»
«(A,2 |\nD)>F A>c d>f |]»
«(c2D) | (aA) (A2 | B)>G A>F G>E (A,2 |\nD)>F A>c d>f |]»

BTW, I'm not concerned about the newline, it is there only to check if the approach can span text over two lines. So stirring the ashes I see captures with and without parenthesis, and a very greedy capture or two.

Clearly I have a problem within my code. My knowledge of perl6 can best be described as "beginner" So I ask for your help. I'm looking for a general solution or at least an example that can be generalized and as always suggestions and corrections are welcome.

like image 524
hsmyers Avatar asked Jan 17 '20 00:01

hsmyers


1 Answers

There are a few added complexities that you have. For instance, you define a tie as being either (...) or just the .... But that inner contents is identical to the line.

Here's a rewritten grammar that greatly simplifies what you want. When writing grammars, it's helpful to start from the small and go up.

grammar Tied {
    rule  TOP   { <notes>+ %% \v+ }
    token notes {
        [
        | <tie>
        | <simple-note>
        ] + 
        %%
        <.ws>?
    }
    token open-tie    { '(' }
    token close-tie   { ')' }
    token tie         { <.open-tie> ~ <.close-tie> <notes> }
    token simple-note { <[A..Ga..g,'>0..9|\]]>             }
}

A few stylistic notes here. Grammars are classes, and it's customary to capitalize them. Tokens are methods, and tend to be lower case with kebap casing (you can of course use any type you want, though). In the tie token, you'll notice that I used <.open-tie>. The . means that we don't need to capture it (that is, we're just using it for matching and nothing else). In the notes token I was able to simplify things a lot by using the %% and making TOP a rule which auto adds some whitespace.

Now, the order that I would create the tokens is this:

  1. <simple-note> because it's the most base level item. A group of them would be
  2. <notes>, so I make that next. While doing that, I realize that a run of notes can also include a…
  3. <tie>, so that's the next one. Inside of a tie though I'm just going to have another run of notes, so I can use <notes> inside it.
  4. <TOP> at last, because if a line just has a run of notes, we can omit line and use %% \v+

Actions (often given the same name as your grammar, plus -Actions, so here I use class Tied-Actions { … }) are normally used to create an abstract syntax tree. But really, the best way to think of this is asking each level of the grammar what we want from it. I find that whereas writing grammars it's easiest to build from the smallest element up, for actions, it's easiest to go from the TOP down. This will also help you build more complex actions down the road:

  1. What do we want from TOP?
    In our case, we just want all the ties that we found in each <note> token. That can be done with a simple loop (because we did a quantifier on <notes> it will be Positional:
    method TOP ($/) {  my @ties; @ties.append: .made for $<notes>; make @ties; }
    The above code creates our temp variable, loops through each <note> and appends on everything that <note> made for us — which is nothing at the moment, but that's okay. Then, because we want the ties from TOP, so we make them, which allows us to access it after parsing.
  2. What do you want from <notes>?
    Again, we just want the ties (but maybe some other time, you want ties and glisses, or some other information). So we can grab the ties basically doing the exact same thing:
    method notes ($/) {  my @ties; @ties.append: .made for $<tie>.grep(*.defined); make @ties; }
    The only differences is rather than doing just for $<tie>, we have to grab just the defined ones — this is a consequence of doing the [<foo>|<bar>]+: $<foo> will have a slot for each quantified match, whether or note <foo> did the matching (this is when you would often want to pop things out to, say, proto token note with a tie and a simple note variant, but that's a bit advaned for this). Again, we grab the whatever $<tie> made for us — we'll define that later, and we "make" it. Whatever we make is what other actions will find made by <notes> (like in TOP).
  3. What do you want from <tie>? Here I'm going to just go for the content of the tie — it's easy enough to grab the parentheses too if you want. You'd think we'd just use make ~$<notes>, but that leaves off something important: $<notes> also has some ties. But those are easy enough to grab:
    method tie ($/) { my @ties = ~$<notes>; @ties.append: $<notes>.made; make @ties; }
    This ensures that we pass along not only the current outer tie, but also each individual inner tie (which in turn may haev another inner one, and so on).

When you parse, all you need to do is grab the .made of the Match:

say Tied.parse("a(b(c))d");
# 「a(b(c))d」
# notes => 「a(b(c))d」
#  simple-note => 「a」
#  tie => 「(b(c))」          <-- there's a tie!
#   notes => 「b(c)」
#    simple-note => 「b」
#    tie => 「(c)」           <-- there's another!
#     notes => 「c」
#      simple-note => 「c」
#  simple-note => 「d」
say Tied.parse("a(b(c))d", actions => TiedActions).made;
# [b(c) c]

Now, if you really only will ever need the ties  —and nothing else— (which I don't think is the case), you can things much more simply. Using the same grammar, use instead the following actions:

class Tied-Actions {
    has @!ties;
    method TOP ($/) { make @!ties            }
    method tie ($/) { @!ties.push: ~$<notes> }
}

This has several disadvantages over the previous: while it works, it's not very scalable. While you'll get every tie, you won't know anything about its context. Also, you have to instantiate Tied-Actions (that is, actions => TiedActions.new), whereas if you can avoid using any attributes, you can pass the type object.

like image 112
user0721090601 Avatar answered Nov 15 '22 08:11

user0721090601