I'm trying to write a token that allows nested content with matching delimiters. Where (AB) should result in a match to at least "AB" if not "(AB)". And (A(c)B) would return two matches "(A(c)B)" and so on.
Code boiled down from its source:
#!/home/hsmyers/rakudo741/bin/perl6
use v6d;
my @tie;
class add-in {
method tie($/) { @tie.push: $/; }
}
grammar tied {
rule TOP { <line>* }
token line {
<.ws>?
[
| <tie>
| <simpleNotes>
]+
<.ws>?
}
token tie {
[
|| <.ws>? <simpleNotes>+ <tie>* <simpleNotes>* <.ws>?
|| <openParen> ~ <closeParen> <tie>
]+
}
token openParen { '(' }
token closeParen { ')' }
token simpleNotes {
[
| <[A..Ga..g,'>0..9]>
| <[|\]]>
| <blank>
]
}
}
my $text = "(c2D) | (aA) (A2 | B)>G A>F G>E (A,2 |\nD)>F A>c d>f |]";
tied.parse($text, actions => add-in.new).say;
$text.say;
for (@tie) {
s:g/\v/\\n/;
say "«$_»";
}
This gives a partially correct result of:
«c2D»
«aA»
«(aA)»
«A2 | B»
«\nD»
«A,2 |\nD»
«(A,2 |\nD)>F A>c d>f |]»
«(c2D) | (aA) (A2 | B)>G A>F G>E (A,2 |\nD)>F A>c d>f |]»
BTW, I'm not concerned about the newline, it is there only to check if the approach can span text over two lines. So stirring the ashes I see captures with and without parenthesis, and a very greedy capture or two.
Clearly I have a problem within my code. My knowledge of perl6 can best be described as "beginner" So I ask for your help. I'm looking for a general solution or at least an example that can be generalized and as always suggestions and corrections are welcome.
There are a few added complexities that you have. For instance, you define a tie
as being either (...)
or just the ...
. But that inner contents is identical to the line.
Here's a rewritten grammar that greatly simplifies what you want. When writing grammars, it's helpful to start from the small and go up.
grammar Tied {
rule TOP { <notes>+ %% \v+ }
token notes {
[
| <tie>
| <simple-note>
] +
%%
<.ws>?
}
token open-tie { '(' }
token close-tie { ')' }
token tie { <.open-tie> ~ <.close-tie> <notes> }
token simple-note { <[A..Ga..g,'>0..9|\]]> }
}
A few stylistic notes here. Grammars are classes, and it's customary to capitalize them. Tokens are methods, and tend to be lower case with kebap casing (you can of course use any type you want, though). In the tie
token, you'll notice that I used <.open-tie>
. The .
means that we don't need to capture it (that is, we're just using it for matching and nothing else). In the notes
token I was able to simplify things a lot by using the %%
and making TOP
a rule which auto adds some whitespace.
Now, the order that I would create the tokens is this:
<simple-note>
because it's the most base level item. A group of them would be <notes>
, so I make that next. While doing that, I realize that a run of notes can also include a…<tie>
, so that's the next one. Inside of a tie though I'm just going to have another run of notes, so I can use <notes>
inside it.<TOP>
at last, because if a line just has a run of notes, we can omit line and use %% \v+
Actions (often given the same name as your grammar, plus -Actions
, so here I use class Tied-Actions { … }
) are normally used to create an abstract syntax tree. But really, the best way to think of this is asking each level of the grammar what we want from it. I find that whereas writing grammars it's easiest to build from the smallest element up, for actions, it's easiest to go from the TOP down. This will also help you build more complex actions down the road:
TOP
?<note>
token. That can be done with a simple loop (because we did a quantifier on <notes>
it will be Positional
:method TOP ($/) {
my @ties;
@ties.append: .made for $<notes>;
make @ties;
}
<note>
and appends on everything that <note>
made for us — which is nothing at the moment, but that's okay. Then, because we want the ties from TOP, so we make
them, which allows us to access it after parsing.<notes>
?method notes ($/) {
my @ties;
@ties.append: .made for $<tie>.grep(*.defined);
make @ties;
}
for $<tie>
, we have to grab just the defined ones — this is a consequence of doing the [<foo>|<bar>]+
: $<foo>
will have a slot for each quantified match, whether or note <foo>
did the matching (this is when you would often want to pop things out to, say, proto token note
with a tie and a simple note variant, but that's a bit advaned for this). Again, we grab the whatever $<tie>
made for us — we'll define that later, and we "make" it. Whatever we make
is what other actions will find made
by <notes>
(like in TOP
).<tie>
?
Here I'm going to just go for the content of the tie — it's easy enough to grab the parentheses too if you want. You'd think we'd just use make ~$<notes>
, but that leaves off something important: $<notes>
also has some ties. But those are easy enough to grab:method tie ($/) {
my @ties = ~$<notes>;
@ties.append: $<notes>.made;
make @ties;
}
When you parse, all you need to do is grab the .made
of the Match
:
say Tied.parse("a(b(c))d");
# 「a(b(c))d」
# notes => 「a(b(c))d」
# simple-note => 「a」
# tie => 「(b(c))」 <-- there's a tie!
# notes => 「b(c)」
# simple-note => 「b」
# tie => 「(c)」 <-- there's another!
# notes => 「c」
# simple-note => 「c」
# simple-note => 「d」
say Tied.parse("a(b(c))d", actions => TiedActions).made;
# [b(c) c]
Now, if you really only will ever need the ties —and nothing else— (which I don't think is the case), you can things much more simply. Using the same grammar, use instead the following actions:
class Tied-Actions {
has @!ties;
method TOP ($/) { make @!ties }
method tie ($/) { @!ties.push: ~$<notes> }
}
This has several disadvantages over the previous: while it works, it's not very scalable. While you'll get every tie, you won't know anything about its context. Also, you have to instantiate Tied-Actions (that is, actions => TiedActions.new
), whereas if you can avoid using any attributes, you can pass the type object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With