I am having problems with this mini-grammar, which tries to match markdown-like header constructs.
role Like-a-word {
regex like-a-word { \S+ }
}
role Span does Like-a-word {
regex span { <like-a-word>[\s+ <like-a-word>]* }
}
grammar Grammar::Headers does Span {
token TOP {^ <header> \v+ $}
token hashes { '#'**1..6 }
regex header {^^ <hashes> \h+ <span> [\h* $0]? $$}
}
I would like it to match ## Easier ##
as a header, but instead it takes ##
as part of span
:
TOP
| header
| | hashes
| | * MATCH "##"
| | span
| | | like-a-word
| | | * MATCH "Easier"
| | | like-a-word
| | | * MATCH "##"
| | | like-a-word
| | | * FAIL
| | * MATCH "Easier ##"
| * MATCH "## Easier ##"
* MATCH "## Easier ##\n"
「## Easier ##
」
header => 「## Easier ##」
hashes => 「##」
span => 「Easier ##」
like-a-word => 「Easier」
like-a-word => 「##」
The problem is that the [\h* $0]?
simply does not seem to work, with span
gobbling up all available words. Any idea?
First, as others have pointed out, <hashes>
does not capture into $0
, but instead, it captures into $<hashes>
, so you have to write:
regex header {^^ <hashes> \h+ <span> [\h* $<hashes>]? $$}
But that still doesn't match the way you want, because the [\h* $<hashes>]?
part happily matches zero occurrences.
The proper fix is to not let span
match ##
as a word:
role Like-a-word {
regex like-a-word { <!before '#'> \S+ }
}
role Span does Like-a-word {
regex span { <like-a-word>[\s+ <like-a-word>]* }
}
grammar Grammar::Headers does Span {
token TOP {^ <header> \v+ $}
token hashes { '#'**1..6 }
regex header {^^ <hashes> \h+ <span> [\h* $<hashes>]? $$}
}
say Grammar::Headers.subparse("## Easier ##\n", :rule<header>);
If you are loath to modify like-a-word
, you can also force the exclusion of a final #
from it like this:
role Like-a-word {
regex like-a-word { \S+ }
}
role Span does Like-a-word {
regex span { <like-a-word>[\s+ <like-a-word>]* }
}
grammar Grammar::Headers does Span {
token TOP {^ <header> \v+ $}
token hashes { '#'**1..6 }
regex header {^^ <hashes> \h+ <span> <!after '#'> [\h* $<hashes>]? $$}
}
say Grammar::Headers.subparse("## Easier ##\n", :rule<header>);
Just change
regex header {^^ <hashes> \h+ <span> [\h* $0]? $$}
to
regex header {^^ (<hashes>) \h+ <span> [\h* $0]? $$}
So that the capture works. Thanks to Eugene Barsky for calling this.
I played with this a bit because I thought there were two interesting things you might do.
First, you can make hashes
take an argument about how many it will match. That way you can do special things based on the level if you like. You can reuse hashes
in different parts of the grammar where you require different but exact numbers of hash marks.
Next, the ~
stitcher allows you to specify that something will show up in the middle of two things so you can put those wrapper things next to each other. For example, to match (Foo)
you could write '(' ~ ')' Foo
. With that it looks like I came up with the same thing you posted:
use Grammar::Tracer;
role Like-a-word {
regex like-a-word { \S+ }
}
role Span does Like-a-word {
regex span { <like-a-word>[\s+ <like-a-word>]* }
}
grammar Grammar::Headers does Span {
token TOP {^ <header> \v+ $}
token hashes ( $n = 1 ) { '#' ** {$n} }
regex header { [(<hashes(2)>) \h*] ~ [\h* $0] <span> }
}
my $result = Grammar::Headers.parse( "## Easier ##\n" );
say $result;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With