Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing "cut" in a recursive descent parser

I'm implementing a PEG parser generator in Python, and I've had success so far, except with the "cut" feature, of which whomever knows Prolog must know about.

The idea is that after a cut (!) symbol has been parsed, then no alternative options should be attempted at the same level.

expre = '(' ! list ')' | atom.

Means that after the ( is seen, the parsing must succeed, or fail without trying the second option.

I'm using Python's (very efficient) exception system to force backtracking, so I tried having a special FailedCut exception that would abort the enclosing choice, but that didn't work.

Any pointers to how this functionality is implemented in other parser generators would be helpful.

Maybe the problem I've had has been lack of locality. The code generated for the left part of the rule would be something like:

cut_seen = False
try:
    self.token('(')
    cut_seen = True 
    self.call('list')
    self.token(')')
except FailedParse as e:
    if cut_seen:
         raise FailedCut(e)
    raise

Then the code generated for the choice (|) operator will skip the following choices if it catches a FailedCut. What I mean by lack of locality is that the choice catching the FailedCut may be deep up in calls, thus having an effect too-difficult to discern.

Instead of making the code generated for sequences try to inform enclosing choices of cuts, I could make the code generated for choices beware of them. That would make the scope of cuts very local, unlike Prolog's, but good enough for what I want in a PEG parser, which is to commit to an option after a certain token sequence has been seen, so the error reporting is refers to that location in the source, instead of to another location where some other option might have been available.

It just occurred to me that if the code generated for a rule/predicate catches FailedCut and translates it into a normal FailedParse exception, then the cuts will have the right scope.

In reference to @false's question, here's a complete example of what I want to work:

start = expre ;

expre = named | term ;

named = word ':' ! term;

term = word ;

In that grammar, word can be reached through named or term, but I would like the parser to commit to the named branch after it has seen the :.

The Solution

To be fair, I've published my work so far at https://bitbucket.org/apalala/grako/.

In the final solution, sequences are enclosed with this context manager:

@contextmanager
def _sequence(self):
    self._push_cut()
    try:
        yield
    except FailedParse as e:
        if self._cut():
            self.error(e, FailedCut)
        else:
            raise
    finally:
        self._pop_cut()

And options in a choice function are enclosed with this:

@contextmanager
def _option(self):
    p = self._pos
    try:
        self._push_ast()
        try:
            yield
            ast = self.ast
        finally:
            self._pop_ast()
        self.ast.update(ast)
    except FailedCut as e:
        self._goto(p)
        raise e.nested
    except FailedParse:
        self._goto(p)

Which forces an exit out of the choice instead of a return to try the next option.

The cuts themselves are implemented thus:

def _cut(self):
    self._cut_stack[-1] = True

The full source code may be found on Bitbucket.

like image 639
Apalala Avatar asked Jan 03 '13 18:01

Apalala


2 Answers

In a Prolog with ISO Prolog's exception handling (catch/3 and throw/1), a cut could be implemented as:

cut. % Simply succeeds
cut :-
   throw(cut). % on backtracking throws an exception

This would require to catch that exception at appropriate places. For example, each goal (that is non-terminal) of a user defined predicate could now be wrapped with:

catchcut(Goal) :-
   catch(Goal,cut,fail).

This is not the most efficient way to implement cut since it does not free resources upon success of !, but it might be sufficient for your purposes. Also, this method now might interfere with user-defined uses of catch/3. But you probably do not want to emulate the entire Prolog language in any case.

Also, consider to use Prolog's dcg-grammars directly. There is a lot of fine print that is not evident when implementing this in another language.

like image 101
false Avatar answered Oct 01 '22 18:10

false


The solution proposed at the end of my question worked:

cut_seen = False
try:
    self.token('(')
    cut_seen = True 
    self.call('list')
    self.token(')')
except FailedParse as e:
    if cut_seen:
         raise FailedCut(e)
    raise

Then, any time a choice or optional is evaluated, the code looks like this:

p = self.pos
try:
   # code for the expression
except FailedCut:
    raise
except FailedParse:
    self.goto(p)

Edit

The actual solution required keeping a "cut stack". The source code is int Bitbucket.

like image 35
Apalala Avatar answered Oct 01 '22 18:10

Apalala