Parsing code with syntax errors

Question

Parsing techniques are well described in CS literature. But the algorithms I know of require that the source is syntactically correct. If a syntax error is encountered, parsing is immediately aborted.

But IDE's (like Visual Studio) are typically able to provide meaningful code completion and other hints while typing, which mean the syntax is often not in a valid state. E.g. you type an opening parenthesis in a function call, and the IDE provide parameter hints for the function, even though the syntax is invalid until the closing parenthesis is typed.

It seems to me this must rely on some kind of guessing or error-tolerant parser. Anyone know what techniques or algorithms are used for this?

Ira Baxter · Accepted Answer

The standard trick is to do some kind of error repair using the parsing machinery to help make predictions.

For table-based parsers (such as LALR or GLR), when a syntax error occurs, the parser was recently in some state in which the error had not yet happened. One can record the parse stack to remember this before each shift (or alternatively record reductions before the error). Given that an error as been encountered, one can inspect the parse state for the saved stack to determine which tokens might be next (this is also how one can do code completion in terms of syntax tokens). A more sophisticated technique can invent the smallest possible sequence of tokens that allow a shift by the error token, or the smallest possible tree that could replace the error token and allow a shift on the next.

This isn't so easy with recursive descent parsers because there isn't a lot of information lying around with which make a predication. For error recovery, a cheesy trick is define error recovery points (e.g., where a "stmt" might be accepted) and continue scanning until a ";" is found and accept and "error stmt". This doesn't help if you want code completion.

SK-logic · Answer

Packrat is promising - it provides information on both successful and failed parsing attempt at key points, which can be recovered and used for smart error reporting, completion, hints and so on. For example, if the cursor is at a point where all the parsing attempts are marked as failed in a cache, a list of tokens tried can be given for completion options.

Parsing code with syntax errors

Tags:

language-agnostic

parsing

JacquesB

2 Answers

Ira Baxter

SK-logic

Recent Activity

Donate For Us

Parsing code with syntax errors

Tags:

language-agnostic

parsing

JacquesB

2 Answers

Ira Baxter

SK-logic

Related questions

Recent Activity

Donate For Us