Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grammar a bit too greedy in Perl6

I am having problems with this mini-grammar, which tries to match markdown-like header constructs.

role Like-a-word {
    regex like-a-word { \S+ }
}

role Span does Like-a-word {
    regex span { <like-a-word>[\s+ <like-a-word>]* } 
}
grammar Grammar::Headers does Span {
    token TOP {^ <header> \v+ $}

    token hashes { '#'**1..6 }

    regex header {^^ <hashes> \h+ <span> [\h* $0]? $$}
}

I would like it to match ## Easier ## as a header, but instead it takes ## as part of span:

TOP
|  header
|  |  hashes
|  |  * MATCH "##"
|  |  span
|  |  |  like-a-word
|  |  |  * MATCH "Easier"
|  |  |  like-a-word
|  |  |  * MATCH "##"
|  |  |  like-a-word
|  |  |  * FAIL
|  |  * MATCH "Easier ##"
|  * MATCH "## Easier ##"
* MATCH "## Easier ##\n"
「## Easier ##
」
 header => 「## Easier ##」
  hashes => 「##」
  span => 「Easier ##」
   like-a-word => 「Easier」
   like-a-word => 「##」

The problem is that the [\h* $0]? simply does not seem to work, with span gobbling up all available words. Any idea?

like image 810
jjmerelo Avatar asked Jan 05 '18 09:01

jjmerelo


3 Answers

First, as others have pointed out, <hashes> does not capture into $0, but instead, it captures into $<hashes>, so you have to write:

regex header {^^ <hashes> \h+ <span> [\h* $<hashes>]? $$}

But that still doesn't match the way you want, because the [\h* $<hashes>]? part happily matches zero occurrences.

The proper fix is to not let span match ## as a word:

role Like-a-word {
    regex like-a-word { <!before '#'> \S+ }
}

role Span does Like-a-word {
    regex span { <like-a-word>[\s+ <like-a-word>]* } 
}
grammar Grammar::Headers does Span {
    token TOP {^ <header> \v+ $}

    token hashes { '#'**1..6 }

    regex header {^^ <hashes> \h+ <span> [\h* $<hashes>]? $$}
}

say Grammar::Headers.subparse("## Easier ##\n", :rule<header>);

If you are loath to modify like-a-word, you can also force the exclusion of a final # from it like this:

role Like-a-word {
    regex like-a-word { \S+ }
}

role Span does Like-a-word {
    regex span { <like-a-word>[\s+ <like-a-word>]* } 
}
grammar Grammar::Headers does Span {
    token TOP {^ <header> \v+ $}

    token hashes { '#'**1..6 }

    regex header {^^ <hashes> \h+ <span> <!after '#'> [\h* $<hashes>]? $$}
}

say Grammar::Headers.subparse("## Easier ##\n", :rule<header>);
like image 198
moritz Avatar answered Nov 18 '22 07:11

moritz


Just change

  regex header {^^ <hashes> \h+ <span> [\h* $0]? $$}

to

  regex header {^^ (<hashes>) \h+ <span> [\h* $0]? $$}

So that the capture works. Thanks to Eugene Barsky for calling this.

like image 22
jjmerelo Avatar answered Nov 18 '22 07:11

jjmerelo


I played with this a bit because I thought there were two interesting things you might do.

First, you can make hashes take an argument about how many it will match. That way you can do special things based on the level if you like. You can reuse hashes in different parts of the grammar where you require different but exact numbers of hash marks.

Next, the ~ stitcher allows you to specify that something will show up in the middle of two things so you can put those wrapper things next to each other. For example, to match (Foo) you could write '(' ~ ')' Foo. With that it looks like I came up with the same thing you posted:

use Grammar::Tracer;

role Like-a-word {
    regex like-a-word { \S+ }
}

role Span does Like-a-word {
    regex span { <like-a-word>[\s+ <like-a-word>]* }
}

grammar Grammar::Headers does Span {
    token TOP {^ <header> \v+ $}

    token hashes ( $n = 1 ) { '#' ** {$n} }

    regex header { [(<hashes(2)>) \h*] ~ [\h* $0] <span>  }
}

my $result = Grammar::Headers.parse( "## Easier ##\n" );

say $result;
like image 3
brian d foy Avatar answered Nov 18 '22 06:11

brian d foy