Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any references for parsing incomplete or incorrect code?

Can anybody point me at references on techniques for parsing code that contains syntax errors, or is missing necessary punctuation, for example?

The application that I'm working on is an IDE, where we'd like to provide features like "jump to definition", auto-complete, and refactoring features, without requiring the source to be syntactically correct at the moment the functions are invoked.

Most parser code I've seen appears to work on the principle of "fail early", rather than focusing on error recovery or parsing partially-complete code.

like image 473
Mark Bessey Avatar asked Sep 14 '12 01:09

Mark Bessey


2 Answers

I don’t know of any papers or tutorials, but uu-parsinglib is a Haskell parsing library that can recover from syntax errors in a general fashion. If, for example, ; was expected but int was received, the parser can continue as though ; were inserted at that source position.

It’s up to you where the parser will fail and where it will proceed with corrections, and the results will be delivered alongside a set of the errors corrected during parsing. Even if you don’t intend to implement your parsing code in Haskell, an examination of the library may offer you some insight. Or you can write a parser in Haskell and call it from C.

like image 161
Jon Purdy Avatar answered Oct 26 '22 05:10

Jon Purdy


Have you tried ANTLR?

In "The Definitive ANTLR Reference", section 10.7 Automatic Error Recovery Strategy for 5 pages Terrence talks about this. He references Algorithms + Data Structures = Programs, A Note on Error Recovery in Recursive Descent Parsers, Efficient and Comfortable Error Recovery in Recursive Descent Parsers.

Also see the pages from the web site:

  • Error reporting and recovery

  • ANTLR 3.0 Error Reporting and Recovery

  • Custom Syntax Error Recovery

Also check the ANTLR tag for accessing the ANTLR forum where Terrence Parr answers questions. He does answer some questions here as The ANTLR Guy.

Also the new version of ANTLR 4 is due out as well as the book.

Sorry to sound like a sales pitch, but I have been using ANTLR for years because it used by lots of people, is used in production systems, has a few solid versions: Java, C, C#, has a very active community, has a web site, has books, is evolving, maintained, open source, BSD license, easy to use and has some GUI tools.

One of the people working on a GUI for ANTLR 4 that has syntax highlight and auto-completion among other useful IDE editing is Sam Harwell. If you can reach him through the ANTLR forum, he might be able to help you out.

like image 25
Guy Coder Avatar answered Oct 26 '22 06:10

Guy Coder