Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to easily adapt the error messages of ANTLR4?

Tags:

java

antlr

antlr4

Currenlty I'm working on my own grammar and I would like to have specific error messages on NoViableAlternative, InputMismatch, UnwantedToken, MissingToken and LexerNoViableAltException.

I already extended the Lexer.class and have overridden the notifyListeners to change the default error message token recognition error at: to my own one. As well I extended the DefaultErrorStrategy and have overridden all report methods, like reportNoViableAlternative, reportInputMismatch, reportUnwantedToken, reportMissingToken.

The purpose of all that is to change the messages, which will be passed to the syntaxError() method of the listener ANTLRErrorListener.

Here's a small example of the extended Lexer.class:

    @Override
    public void notifyListeners(LexerNoViableAltException lexerNoViableAltException) {
        String text = this._input.getText(Interval.of(this._tokenStartCharIndex, this._input.index()));
        String msg = "Operator " + this.getErrorDisplay(text) + " is unkown.";
        ANTLRErrorListener listener = this.getErrorListenerDispatch();
        listener.syntaxError(this, null, this._tokenStartLine, this._tokenStartCharPositionInLine, msg,
            lexerNoViableAltException);
    }

Or for the DefaultErrorStrategy:

    @Override
    protected void reportNoViableAlternative(Parser recognizer, NoViableAltException noViableAltException) {
        TokenStream tokens = recognizer.getInputStream();
        String input;
        if (tokens != null) {
            if (noViableAltException.getStartToken().getType() == -1) {
                input = "<EOF>";
            } else {
                input = tokens.getText(noViableAltException.getStartToken(), noViableAltException.getOffendingToken());
            }
        } else {
            input = "<unknown input>";
        }

        String msg = "Invalid operation " + input + ".";
        recognizer.notifyErrorListeners(noViableAltException.getOffendingToken(), msg, noViableAltException);
    }

So I read this thread Handling errors in ANTLR4 and was wondering if there's no easier solution when it comes to the point of customising?

like image 359
stefan94452 Avatar asked Sep 19 '19 12:09

stefan94452


1 Answers

My strategy for improving the ANTLR4 error messages is a bit different. I use a syntaxError override in my error listeners (I have one for both the lexer and the parser). By using the given values and a few other stuff like the LL1Analyzer you can create pretty precise error messages. The lexer error listener's handling is pretty straight forward (hopefully C++ code is understandable for you):

void LexerErrorListener::syntaxError(Recognizer *recognizer, Token *, size_t line,
                                     size_t charPositionInLine, const std::string &, std::exception_ptr ep) {
  // The passed in string is the ANTLR generated error message which we want to improve here.
  // The token reference is always null in a lexer error.
  std::string message;
  try {
    std::rethrow_exception(ep);
  } catch (LexerNoViableAltException &) {
    Lexer *lexer = dynamic_cast<Lexer *>(recognizer);
    CharStream *input = lexer->getInputStream();
    std::string text = lexer->getErrorDisplay(input->getText(misc::Interval(lexer->tokenStartCharIndex, input->index())));
    if (text.empty())
      text = " "; // Should never happen.

    switch (text[0]) {
      case '/':
        message = "Unfinished multiline comment";
        break;
      case '"':
        message = "Unfinished double quoted string literal";
        break;
      case '\'':
        message = "Unfinished single quoted string literal";
        break;
      case '`':
        message = "Unfinished back tick quoted string literal";
        break;

      default:
        // Hex or bin string?
        if (text.size() > 1 && text[1] == '\'' && (text[0] == 'x' || text[0] == 'b')) {
          message = std::string("Unfinished ") + (text[0] == 'x' ? "hex" : "binary") + " string literal";
          break;
        }

        // Something else the lexer couldn't make sense of (likely there is no rule that accepts this input).
        message = "\"" + text + "\" is no valid input at all";
        break;
    }
    owner->addError(message, 0, lexer->tokenStartCharIndex, line, charPositionInLine,
                    input->index() - lexer->tokenStartCharIndex);
  }
}

This code shows that we don't use the original message at all and instead examine the token text to see what's wrong. Here we mostly deal with unclosed strings:

enter image description here

The parser error listener is much more complicated and too large to post here. It's a combination of different sources to construct the actual error message:

  • Parser.getExpectedTokens(): uses the LL1Analyzer to get the next possible lexer tokens from a given position in the ATN (the socalled follow-set). It looks through predicates however, which might be a problem (if you use such).

  • Identifiers & keywords: often certain keywords are allowed as normal identifiers in specific situations, which creates follow-sets with a list of keywords that are actually meant to be identifiers, so that needs an extra check to avoid showing them as expected values:

enter image description here

  • Parser rule invocation stack, during the call to the error listener the parser has the current parser rule context (Parser.getRuleContext()) which you can use to walk up the invocation stack, to find rule contexts that give you more specific information of the error location (for example, walking up from a * match to a hypothetical expr rule tells you that actually an expression is expected at this point).

  • The given exception: if this is null the error is about a missing or unwanted single token, which is pretty easy to handle. If the exception has a value you can examine it for further details. Worth mentioning here is that the content of the exception is not used (and pretty sparse anyway), instead we use the values that were collected previously. The most common exception types are NoViableAlt and InputMismatch, which you can both translate to either "input is incomplete" when the error position is EOF or something like "input is not valid at this position". Both can then be enhanced with an expectation constructed from the rule invocation stack and/or the follow-set as mentioned (and shown in the image) above.

like image 117
Mike Lischke Avatar answered Sep 23 '22 00:09

Mike Lischke