Can anybody point me at references on techniques for parsing code that contains syntax errors, or is missing necessary punctuation, for example?
The application that I'm working on is an IDE, where we'd like to provide features like "jump to definition", auto-complete, and refactoring features, without requiring the source to be syntactically correct at the moment the functions are invoked.
Most parser code I've seen appears to work on the principle of "fail early", rather than focusing on error recovery or parsing partially-complete code.
I don’t know of any papers or tutorials, but uu-parsinglib is a Haskell parsing library that can recover from syntax errors in a general fashion. If, for example, ;
was expected but int
was received, the parser can continue as though ;
were inserted at that source position.
It’s up to you where the parser will fail and where it will proceed with corrections, and the results will be delivered alongside a set of the errors corrected during parsing. Even if you don’t intend to implement your parsing code in Haskell, an examination of the library may offer you some insight. Or you can write a parser in Haskell and call it from C.
Have you tried ANTLR?
In "The Definitive ANTLR Reference", section 10.7 Automatic Error Recovery Strategy for 5 pages Terrence talks about this. He references Algorithms + Data Structures = Programs, A Note on Error Recovery in Recursive Descent Parsers, Efficient and Comfortable Error Recovery in Recursive Descent Parsers.
Also see the pages from the web site:
Error reporting and recovery
ANTLR 3.0 Error Reporting and Recovery
Custom Syntax Error Recovery
Also check the ANTLR tag for accessing the ANTLR forum where Terrence Parr answers questions. He does answer some questions here as The ANTLR Guy.
Also the new version of ANTLR 4 is due out as well as the book.
Sorry to sound like a sales pitch, but I have been using ANTLR for years because it used by lots of people, is used in production systems, has a few solid versions: Java, C, C#, has a very active community, has a web site, has books, is evolving, maintained, open source, BSD license, easy to use and has some GUI tools.
One of the people working on a GUI for ANTLR 4 that has syntax highlight and auto-completion among other useful IDE editing is Sam Harwell. If you can reach him through the ANTLR forum, he might be able to help you out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With