Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Antlr Syntax Errors or how to give a better message on unexpected token

We have the following sub-part of an Antlr grammar:

signed_int
        : SIGN? INT
    ;

INT : '0'..'9'+
        ;

When someone enters a numeric value everything is fine, but if they mistakenly type something like 1O (one and capital o) we get a cryptic error message like:

error 1 : Missing token  at offset 14
near [Index: 0 (Start: 0-Stop: 0) ='<missing COLON>'     type<24> Line: 26 LinePos:14]
 : syntax error...

What is a good way to handle this type of error? I thought of defining catch-all SYMBOL token type but this lead to too many parser building errors. I will continue looking into Antlr error handling but I thought I would post this here to look for some insights.

like image 928
Burton Samograd Avatar asked Apr 30 '12 15:04

Burton Samograd


People also ask

What can you do with ANTLR?

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

What grammar does ANTLR use?

A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF). ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers.

What is ANTLR in compiler?

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Why should a start rule end with EOF end of file in an ANTLR grammar?

You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF , it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.


1 Answers

You should Override the reportError methods in lexer and parser. You can do it by adding this code to your lexer file:

  @Override
public void reportError(RecognitionException e) {
    throw new RuntimeException(e);
}

And create a method matches in parser that checks if input string matches the specified grammar:

 public static boolean matches(String input) {
     try {
         regExLexer lexer = new regExLexer(new ANTLRStringStream(input));
         regExParser parser = new regExParser(new CommonTokenStream(lexer));
         parser.goal();
         return true;
     } catch (RuntimeException e) {
         return false;
     }
     catch (Exception e) {
         return false;
     }
     catch (OutOfMemoryError e) {
         return false;
     }

 }

 @Override
 public void reportError(RecognitionException e) {
     throw new RuntimeException(e);
 }

Then in your file use the Parser.matches(input); to check if the given input matches the gramar. If it matches the method returns true, otherwise returns false, so when it returns false you can give any customized error message to users.

like image 116
sm13294 Avatar answered Sep 20 '22 13:09

sm13294