I'm trying to implement a Markdown parser with Perl6 grammar and got stuck with blockquotes. A blockquote paragraph cannot be expressed in terms of nested braces because it is a list of specifically formatted lines. But semantically it is a nested markdown.
Basically, it all came down to the following definition:
token mdBlockquote {
<mdBQLine>+ {
my $quoted = [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
}
}
The actual implementation of mdBQLine
token is not relevant here. The only imporant thing to note is that mdBQLineBody
key contains actually quoted line with >
stripped off already. After all, for a block:
> # quote1
> quote2
>
> quote3
quote3.1
the $quoted
scalar will contain:
# quote1
quote2
quote3
quote3.1
Now, the whole point is to have the above data parsed and injected back into the Match
object $/
. And this is where I'm totally stuck with no idea. The most apparent solution:
token mdBlockquote {
<mdBQLine>+ {
my $quoted = [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
$<mdBQParsed> = self.parse( $quoted, actions => self.actions );
}
}
Fails for two reasons at once: first, $/
is a read-only object; second, .parse
modifies it effectively making it impossible to inject anything into the original tree.
Is there any solution then post-analysing the parsed data, extracting and re-parsing blockquotes, repeat...?
Expanding a little on @HåkonHægland's comment...
$/
is a read-only object ... effectively making it impossible to inject anything into the original tree.
Not quite:
Pedantically speaking, $/
is a symbol and never an object whether or not it's bound to one. If it's a parameter (and not declared with is rw
or is copy
), then it's read-only but otherwise it can be freely rebound, eg. $/ := 42
.
But what you're referring to is assignment to a key. The semantics of assignment is determined by the item(s) being assigned to. If they're ordinary objects that are not containers then they won't support lvalue semantics and you'll get a Cannot modify an immutable ...
error if you try to assign to them. A Match
object is immutable in this sense.
What you can do is hang arbitrary data off any Match
object by using the .make
method on it. (The make
routine calls this method on $/
.) This is how you store custom data in a parse tree.
To access what's made in a given node of a parse tree / Match
object, call .made
(or .ast
which is a synonym) on that node.
Typically what you make
for higher nodes in a parse tree includes what was made for lower level nodes.
Please try the following untested code out and see what you get, then comment if it fails miserably and you can't figure out a way to make it work, or build from there taking the last two paragraphs above into consideration, and comment on how it works out:
token mdBlockquote {
<mdBQLine>+ {
make .parse: [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
}
}
Ok, here is the final solution I used. The grammar rule looks like this:
token mdBlockquote {
<mdBQLine>+ {
my $m = $/;
my $bq-body = [~] $m<mdBQLine>.map( { $_<mdBQLineBody> } );
$m.make(
self.WHAT.parse(
$bq-body,
actions => self.actions.clone,
)
);
}
}
Important tricks here are backing up of $/
in $m
because .parse
will replace it.
Blockquote body is prefetched into $bq-body
before calling .parse
because there was a confusing side-effect if the expression is passed directly as an argument.
.parse
is called on self.WHAT
to avoid messing up with current grammar object.
This rule will end up with $m.ast
containing a Match
object which in turn would contain actions-generated data. Corresponding actions method then does the following:
method mdBlockquote ($m) {
my $bq = self.makeNode( "Blockquote" );
$bq.push( $m.ast.ast );
$m.make( $bq );
}
Since the actions object builts a streamlined AST suitable for simple translation of markdown into other formats, what happens here is it fetches a brach of that tree generated by a recursive .parse
and engrafts it into the main tree.
That is great is that the code supports nested blockquotes out of the box, no special handling is needed. What is not good is that it is still a lot of extra code whereas something like:
token mdBlockquote {
<mdBQLine>+ $<mdBQBody>={
my $bq-body = [~] $<mdBQLine>.map( { $_<mdBQLineBody> } );
self.WHAT.parse(
$bq-body,
actions => self.actions.clone,
);
}
}
whould look way better and won't require actions object intervention beyond its normal duties. 😀
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With