Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement recursive grammar in Perl6

Tags:

raku

I'm trying to implement a Markdown parser with Perl6 grammar and got stuck with blockquotes. A blockquote paragraph cannot be expressed in terms of nested braces because it is a list of specifically formatted lines. But semantically it is a nested markdown.

Basically, it all came down to the following definition:

    token mdBlockquote {
        <mdBQLine>+ {
            my $quoted = [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
        }
    }

The actual implementation of mdBQLine token is not relevant here. The only imporant thing to note is that mdBQLineBody key contains actually quoted line with > stripped off already. After all, for a block:

> # quote1
> quote2
>
> quote3
quote3.1

the $quoted scalar will contain:

# quote1
quote2

quote3
quote3.1

Now, the whole point is to have the above data parsed and injected back into the Match object $/. And this is where I'm totally stuck with no idea. The most apparent solution:

    token mdBlockquote {
        <mdBQLine>+ {
            my $quoted = [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
            $<mdBQParsed> = self.parse( $quoted, actions => self.actions );
        }
    }

Fails for two reasons at once: first, $/ is a read-only object; second, .parse modifies it effectively making it impossible to inject anything into the original tree.

Is there any solution then post-analysing the parsed data, extracting and re-parsing blockquotes, repeat...?

like image 898
Vadim Belman Avatar asked Jul 16 '18 03:07

Vadim Belman


2 Answers

Expanding a little on @HåkonHægland's comment...

$/ is a read-only object ... effectively making it impossible to inject anything into the original tree.

Not quite:

  • Pedantically speaking, $/ is a symbol and never an object whether or not it's bound to one. If it's a parameter (and not declared with is rw or is copy), then it's read-only but otherwise it can be freely rebound, eg. $/ := 42.

  • But what you're referring to is assignment to a key. The semantics of assignment is determined by the item(s) being assigned to. If they're ordinary objects that are not containers then they won't support lvalue semantics and you'll get a Cannot modify an immutable ... error if you try to assign to them. A Match object is immutable in this sense.

What you can do is hang arbitrary data off any Match object by using the .make method on it. (The make routine calls this method on $/.) This is how you store custom data in a parse tree.

To access what's made in a given node of a parse tree / Match object, call .made (or .ast which is a synonym) on that node.

Typically what you make for higher nodes in a parse tree includes what was made for lower level nodes.

Please try the following untested code out and see what you get, then comment if it fails miserably and you can't figure out a way to make it work, or build from there taking the last two paragraphs above into consideration, and comment on how it works out:

token mdBlockquote {
    <mdBQLine>+ {
        make .parse: [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
    }
}
like image 65
raiph Avatar answered Oct 08 '22 15:10

raiph


Ok, here is the final solution I used. The grammar rule looks like this:

    token mdBlockquote {
        <mdBQLine>+ {
            my $m = $/;
            my $bq-body =  [~] $m<mdBQLine>.map( { $_<mdBQLineBody> } ); 
            $m.make(
                self.WHAT.parse(
                    $bq-body,
                    actions => self.actions.clone,
                )
            );
        }
    }

Important tricks here are backing up of $/ in $m because .parse will replace it.

Blockquote body is prefetched into $bq-body before calling .parse because there was a confusing side-effect if the expression is passed directly as an argument.

.parse is called on self.WHAT to avoid messing up with current grammar object.

This rule will end up with $m.ast containing a Match object which in turn would contain actions-generated data. Corresponding actions method then does the following:

    method mdBlockquote ($m) {
        my $bq = self.makeNode( "Blockquote" );
        $bq.push( $m.ast.ast );
        $m.make( $bq );
    }

Since the actions object builts a streamlined AST suitable for simple translation of markdown into other formats, what happens here is it fetches a brach of that tree generated by a recursive .parse and engrafts it into the main tree.

That is great is that the code supports nested blockquotes out of the box, no special handling is needed. What is not good is that it is still a lot of extra code whereas something like:

    token mdBlockquote {
        <mdBQLine>+ $<mdBQBody>={
            my $bq-body =  [~] $<mdBQLine>.map( { $_<mdBQLineBody> } ); 
            self.WHAT.parse(
                $bq-body,
                actions => self.actions.clone,
            );
        }
    }

whould look way better and won't require actions object intervention beyond its normal duties. 😀

like image 39
Vadim Belman Avatar answered Oct 08 '22 17:10

Vadim Belman