Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PEG.js Get any text between ( and );

I'm trying to catch some text between parathesis with a semicolon in the end.

Example: (in here there can be 'anything' !"#¤);); any character is possible);

I've tried this:

Text
 = "(" text:(.*) ");" { return text.join(""); }

But it seems (.*) will include the last ); before ");" does and I get the error:

Expected ");" or any character but end of input found

The problem is that the text can contain ");" so I want the outer most ); to descide when the line ends.

This regex \((.*)\); does what I want, but how can I do the same in PEG.js? I don't want to include the outer parentheses and semicolon in the result.

This seems like it should be quite easy if you know what you're doing =P

like image 423
mottosson Avatar asked Dec 19 '22 13:12

mottosson


1 Answers

So, the point is that a PEG is deterministic, while a regex is not. So a PEG won't backtrack once it's accepted some input. We can then simulate the semantics you want. Since you say the regex \((.*)\); does what you want, we might translate this to a PEG.

What does this regex do? It consumes all characters up to the end of the input, then keeps backtracking until it sees a );, i.e., it consumes the last possible );.

To make this work with a PEG, we might use a lookahead to keep consuming iff we have a ); ahead.

So, a solution is:

Text
 = "(" text:TextUntilTerminator ");" { return text.join(""); }

TextUntilTerminator
 = x:(&HaveTerminatorAhead .)* { return x.map(y => y[1]) }

HaveTerminatorAhead
 = . (!");" .)* ");"

The TextUntilTerminator non-terminal consumes while HaveTerminatorAhead matches without consuming it (a lookahead, the & symbol). Then it consumes one single character. It does so until it knows we've reached the final ); on the input.

The HaveTerminalAhead non-terminal is simple: it verifies if there is one character ahead, and, if it does, garantees that there is at least one ); after it. We also use the negative-lookahead ! to stop at the first ); we see (avoid consuming it, which would reproduce your original problem).

This PEG, then, reproduces the behavior of the regex you suggested.

like image 123
paulotorrens Avatar answered Dec 21 '22 02:12

paulotorrens