I'm trying to make a parser using pegjs. I need to parse something like:
blah blah START Lorem ipsum
dolor sit amet, consectetur
adipiscing elit END foo bar
etc.
I have trouble writing the rule to catch the text from "START"
to "END"
.
Use negative lookahead predicates:
phrase
=(!"START" .)* "START" result:(!"END" .)* "END" .* {
for (var i=0;i<result.length;++i)
// remove empty element added by predicate matching
{result[i]=result[i][1];
}
return result.join("");
}
You need to use a negative predicate for END as well as START because repetition in pegjs is greedy.
Alternatively, the action could be written as
{return result.join("").split(',').join("");}
Although this relies on not-necessarily documented behavior of join
when dealing with nested arrays (namely that it joins the sub-arrays with commas and then concatenates them).
[UPDATE] A shorter way to deal with the empty elements is
phrase
=(!"START" .)* "START" result:(t:(!"END" .){return t[1];})* "END" .* {
return result.join("");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With