We have the following sub-part of an Antlr grammar:
signed_int
: SIGN? INT
;
INT : '0'..'9'+
;
When someone enters a numeric value everything is fine, but if they mistakenly type something like 1O (one and capital o) we get a cryptic error message like:
error 1 : Missing token at offset 14
near [Index: 0 (Start: 0-Stop: 0) ='<missing COLON>' type<24> Line: 26 LinePos:14]
: syntax error...
What is a good way to handle this type of error? I thought of defining catch-all SYMBOL token type but this lead to too many parser building errors. I will continue looking into Antlr error handling but I thought I would post this here to look for some insights.
ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.
A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF). ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers.
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF , it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.
You should Override the reportError methods in lexer and parser. You can do it by adding this code to your lexer file:
@Override
public void reportError(RecognitionException e) {
throw new RuntimeException(e);
}
And create a method matches in parser that checks if input string matches the specified grammar:
public static boolean matches(String input) {
try {
regExLexer lexer = new regExLexer(new ANTLRStringStream(input));
regExParser parser = new regExParser(new CommonTokenStream(lexer));
parser.goal();
return true;
} catch (RuntimeException e) {
return false;
}
catch (Exception e) {
return false;
}
catch (OutOfMemoryError e) {
return false;
}
}
@Override
public void reportError(RecognitionException e) {
throw new RuntimeException(e);
}
Then in your file use the Parser.matches(input); to check if the given input matches the gramar. If it matches the method returns true, otherwise returns false, so when it returns false you can give any customized error message to users.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With