Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

raku: markdown grammar to parse sections

Tags:

raku

I wanted to create a raku grammar that can be used to parse a reduced markdown syntax. This reduced markdown syntax must meet the following criteria:

  • a header in markdown must either start with a '#' followed by a space or must be underlined with a sequence of '-' (at least 2).
  • a text can not stand on it's own. it must be preceeded by a header.

To parse this syntax I created the following script:

#!/usr/bin/perl6

use v6;

grammar gram {
    token TOP {
        <text>
    }
    token text {
        [ <section> ]+
    }
    token section {
        <headline> <textline>*
    }
    token headline {
        ^^ [<hashheadline> | <underlineheadline>] $$
    }
    token hashheadline {
        <hashprefix> <headlinecontent>
    }
    token hashprefix {
        [\#] <space>
    }
    token underlineheadline {
        <headlinecontent> [\n] <underline>
    }
    token underline {
        [\-]**2..*
    }
    token headlinecontent {
        [\N]+
    }
    token textline {
        ^^ (<[\N]-[\#]> (<[\N]-[\ ]> [\N]*)? )? [\n] <!before [\-][\-]>
    }
}

my @tests = "",                                         #should not match and doesn't match - OK
            "test1",                                    #should not match and doesn't match - OK
            "test2\n",                                  #should not match and doesn't match - OK
            "test3\nnewline",                           #should not match and doesn't match - OK
            "test4\n----",                              #should match and does match        - OK
            "test5\n----\nnewline",                     #should match but doesn't match     - NOK
            "#test6\nnewline",                          #should not match and doesn't match - OK
            "# test7\nnewline",                         #should match but doesn't match     - NOK
            "# test8",                                  #should match and does match        - OK
            "test9\n----\nnewline\nanother\nnew line",  #should match but doesn't match     - NOK
            "# test10\nnewline\nhead\n---\nanother",    #should match but doesn't match     - NOK
            ;

for @tests -> $test {
    say gram.parse($test).perl;
}

But I have a problem with this grammar: As stated in the comments of the test-array there is something wrong with the grammar but I don't know what.

like image 321
byteunit Avatar asked Feb 27 '21 10:02

byteunit


2 Answers

Change the textline token to:

token textline { \n* <!before <headline>> \N+ \n? }

I have not considered whether that change is what you really want, but it means your tests work as you specify.


In general, develop grammars with CommaIDE. The location of most problems like the one you've posted become immediately obvious. (The solution is of course a distinct step, but pinpointing the problem is often most of the work.)


In general, debug non-obvious problems by producing a Minimal Example (see link provided in my comment on your question, but skip the reproducible part).

Doing so is typically the most efficient way to relatively quickly pinpoint where any non-obvious problem lies.

It's also a fun game, one which you will get ever quicker at by combining your intuition with a loose binary chop like approach.


In general, when asking questions on SO, first produce a Minimal Example (as just discussed) and then make it a Minimum Reproducible Example (building on the Minimal Example). (The example in your question was 100% reproducible -- thanks! -- but I'm writing this answer for other readers as well as for you.)

A Minimum Reproducible Example is a matter of insight and efficiency for yourself, and both those things plus curtesy, for others. Solving your problem took me about 1 minute once I understood where the problem lay. But I spent 15 minutes doing what you would "best" have done before asking a question here:

  • Best for you, because it's fun (and will steadily increase your bug-hunting productivity).

  • Best for me, who had the fun that should by rights have been yours.

  • Best for everyone else trying to answer your question so we don't duplicate unnecessary work.

  • Best for later readers, who get simple questions that address real confusion rather than unfortunately complex questions that so obscure the real issue that their value for readers is lost.

  • Best for Rakoons collectively because posting Minimal Reproducible Examples is something moderators and regular readers of StackOverflow consider an essential ingredient of a good question, which means they are more likely to take Raku seriously, help us moderate our questions, and become Rakoons.

That said, I don't mean to discourage you asking questions, far from it. If, after reading and trying to apply the guidelines in the Minimal Reproducible Example page, you find you are struggling, please go ahead and ask questions anyway, and explain in the question any problems you had in producing a minimal and/or reproducible example, because that will help.

like image 56
raiph Avatar answered Dec 30 '22 20:12

raiph


The raku module Text::Markdown may be relevant for some users.

like image 27
p6steve Avatar answered Dec 30 '22 22:12

p6steve