Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can parser error recovery be guided automatically by the grammar?

I'm writing an LALR parser generator as a pet project.

I'm using the purple dragon book to help me with the design, and what I gather from it is that there are four methods of error recovery in a parser:

  • Panic mode: Start dumping input symbols until a symbol pre-selected by the compiler designer is found
  • Phrase-level recovery: Modify the input string into something that allows the current production to reduce
  • Error productions: Anticipate errors by incorporating them into the grammar
  • Global correction: Way more complicated version of phrase-level recovery (as I understand it)

Two of these require modifying the input string (which I'd like to avoid), and the other two require the compiler designer to anticipate errors and design the error recovery based on their knowledge of the language. But the parser generator also has knowledge about the language, so I'm curious if there's a better way to recover from parsing errors without pre-selecting synchronizing tokens or filling up the grammar with error productions.

Instead of picking synchronizing tokens, can't the parser just treat symbols in the follow of all the nonterminals the current production can reduce to as synchronizing tokens? I haven't really worked out how well that would work - I visualize the parser being down a chain of in-progress productions but of course that's not how bottom-up parsers work. Would it produce too many irrelevant errors trying to find a workable state? Would it attempt to resume the parser in an invalid state? Is there a good way to pre-fill the parser table with valid error actions so the actual parsing program doesn't have to reason about where to go next when an error is encountered?

like image 266
Carson Myers Avatar asked Feb 16 '14 03:02

Carson Myers


People also ask

Which strategy is followed in panic mode recovery in parsing?

Panic mode recovery : The parser discards the input symbol one at a time until one of the designated (like end, semicolon) set of synchronizing tokens (are typically the statement or expression terminators) is found. This is adequate when the presence of multiple errors in the same statement is rare.

What should the parser do in an error case?

The parser requests the error handler to try and recover from the syntax error, and the error handler will recover from the error and give back the recovered node to the parser.

How can panic mode and phrase level recovery be implemented in LR parsers?

We can implement panic-mode error recovery by scanning down the stack until a state s with a goto on a particular nonterminal A is found. Zero or more input symbols are then discarded until a symbol a is found that can legitimately follow A. The parser then stacks the state GOTO(s, A) and resumes normal parsing.


1 Answers

It's way too easy to get lost in a dead-end when you try to blindly follow all available productions. There are things that you know about your language which it would be very difficult for the parser generator to figure out. (Like, for example, that skipping to the next statement delimiter is very likely to allow the parse to recover.)

That's not to say that automated procedures haven't been tried. There is a long section about it in Parsing Theory (Sippu & Soisalon-Soininen). (Unfortunately, this article is paywalled, but if you have an ACM membership or access to a good library, you can probably find it.)

On the whole, the yacc strategy has proven to be "not awful", and even "good enough". There is one well-known way of making it better, which is to collect really bad syntax error messages (or failed error recovery), trace them to the state which is active when they occur (which is easy to do), and attach an error recovery procedure to that precise state and lookahead token. See, for example, Russ Cox's approach.

like image 170
rici Avatar answered Sep 22 '22 06:09

rici