I'm implementing a PEG parser generator in Python, and I've had success so far, except with the "cut" feature, of which whomever knows Prolog must know about.
The idea is that after a cut (!
) symbol has been parsed, then no alternative options should be attempted at the same level.
expre = '(' ! list ')' | atom.
Means that after the (
is seen, the parsing must succeed, or fail without trying the second option.
I'm using Python's (very efficient) exception system to force backtracking, so I tried having a special FailedCut
exception that would abort the enclosing choice, but that didn't work.
Any pointers to how this functionality is implemented in other parser generators would be helpful.
Maybe the problem I've had has been lack of locality. The code generated for the left part of the rule would be something like:
cut_seen = False
try:
self.token('(')
cut_seen = True
self.call('list')
self.token(')')
except FailedParse as e:
if cut_seen:
raise FailedCut(e)
raise
Then the code generated for the choice (|
) operator will skip the following choices if it catches a FailedCut
. What I mean by lack of locality is that the choice catching the FailedCut
may be deep up in calls, thus having an effect too-difficult to discern.
Instead of making the code generated for sequences try to inform enclosing choices of cuts, I could make the code generated for choices beware of them. That would make the scope of cuts very local, unlike Prolog's, but good enough for what I want in a PEG parser, which is to commit to an option after a certain token sequence has been seen, so the error reporting is refers to that location in the source, instead of to another location where some other option might have been available.
It just occurred to me that if the code generated for a rule/predicate catches FailedCut
and translates it into a normal FailedParse
exception, then the cuts will have the right scope.
In reference to @false's question, here's a complete example of what I want to work:
start = expre ;
expre = named | term ;
named = word ':' ! term;
term = word ;
In that grammar, word
can be reached through named
or term
, but I would like the parser to commit to the named
branch after it has seen the :
.
To be fair, I've published my work so far at https://bitbucket.org/apalala/grako/.
In the final solution, sequences are enclosed with this context manager:
@contextmanager
def _sequence(self):
self._push_cut()
try:
yield
except FailedParse as e:
if self._cut():
self.error(e, FailedCut)
else:
raise
finally:
self._pop_cut()
And options in a choice function are enclosed with this:
@contextmanager
def _option(self):
p = self._pos
try:
self._push_ast()
try:
yield
ast = self.ast
finally:
self._pop_ast()
self.ast.update(ast)
except FailedCut as e:
self._goto(p)
raise e.nested
except FailedParse:
self._goto(p)
Which forces an exit out of the choice instead of a return to try the next option.
The cuts themselves are implemented thus:
def _cut(self):
self._cut_stack[-1] = True
The full source code may be found on Bitbucket.
In a Prolog with ISO Prolog's exception handling (catch/3
and throw/1
), a cut could be implemented as:
cut. % Simply succeeds
cut :-
throw(cut). % on backtracking throws an exception
This would require to catch that exception at appropriate places. For example, each goal (that is non-terminal) of a user defined predicate could now be wrapped with:
catchcut(Goal) :-
catch(Goal,cut,fail).
This is not the most efficient way to implement cut since it does not free resources upon success of !
, but it might be sufficient for your purposes. Also, this method now might interfere with user-defined uses of catch/3
. But you probably do not want to emulate the entire Prolog language in any case.
Also, consider to use Prolog's dcg-grammars directly. There is a lot of fine print that is not evident when implementing this in another language.
The solution proposed at the end of my question worked:
cut_seen = False
try:
self.token('(')
cut_seen = True
self.call('list')
self.token(')')
except FailedParse as e:
if cut_seen:
raise FailedCut(e)
raise
Then, any time a choice or optional is evaluated, the code looks like this:
p = self.pos
try:
# code for the expression
except FailedCut:
raise
except FailedParse:
self.goto(p)
Edit
The actual solution required keeping a "cut stack". The source code is int Bitbucket.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With