Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abort on parse error with useful message

I've got an ANTLR 4 grammar and built a lexer and parser from that. Now I'm trying to instantiate that parser in such a way that it will parse until it encounters an error. If it encounters an error, it should not continue parsing, but it should provide useful information about the problem; ideally a machine-readable location and a human-readable message.

Here is what I have at the moment:

grammar Toy;

@parser::members {

    public static void main(String[] args) {
        for (String arg: args)
            System.out.println(arg + " => " + parse(arg));
    }

    public static String parse(String code) {
        ErrorListener errorListener = new ErrorListener();
        CharStream cstream = new ANTLRInputStream(code);
        ToyLexer lexer = new ToyLexer(cstream);
        lexer.removeErrorListeners();
        lexer.addErrorListener(errorListener);
        TokenStream tstream = new CommonTokenStream(lexer);
        ToyParser parser = new ToyParser(tstream);
        parser.removeErrorListeners();
        parser.addErrorListener(errorListener);
        parser.setErrorHandler(new BailErrorStrategy());
        try {
            String res = parser.top().str;
            if (errorListener.message != null)
                return "Parsed, but " + errorListener.toString();
            return res;
        } catch (ParseCancellationException e) {
            if (errorListener.message != null)
                return "Failed, because " + errorListener.toString();
            throw e;
        }
    }

    static class ErrorListener extends BaseErrorListener {

        String message = null;
        int start = -2, stop = -2, line = -2;

        @Override
        public void syntaxError(Recognizer<?, ?> recognizer,
                                Object offendingSymbol,
                                int line,
                                int charPositionInLine,
                                String msg,
                                RecognitionException e) {
            if (message != null) return;
            if (offendingSymbol instanceof Token) {
                Token t = (Token) offendingSymbol;
                start = t.getStartIndex();
                stop = t.getStopIndex();
            } else if (recognizer instanceof ToyLexer) {
                ToyLexer lexer = (ToyLexer)recognizer;
                start = lexer._tokenStartCharIndex;
                stop = lexer._input.index();
            }
            this.line = line;
            message = msg;
        }

        @Override public String toString() {
            return start + "-" + stop + " l." + line + ": " + message;
        }
    }

}

top returns [String str]: e* EOF {$str = "All went well.";};
e: 'a' 'b' | 'a' 'c' e;

Save this to Toy.g, then try these commands:

> java -jar antlr-4.5.2-complete.jar Toy.g
> javac -cp antlr-4.5.2-complete.jar Toy*.java
> java -cp .:tools/antlr-4.5.2-complete.jar ToyParser ab acab acc axb abc
ab => All went well.
acab => All went well.
acc => Failed, because 2-2 l.1: no viable alternative at input 'c'
axb => Parsed, but 1-1 l.1: token recognition error at: 'x'
Exception in thread "main" org.antlr.v4.runtime.misc.ParseCancellationException
    at org.antlr.v4.runtime.BailErrorStrategy.recoverInline(BailErrorStrategy.java:90)
    at org.antlr.v4.runtime.Parser.match(Parser.java:229)
    at ToyParser.top(ToyParser.java:187)
    at ToyParser.parse(ToyParser.java:95)
    at ToyParser.main(ToyParser.java:80)
Caused by: org.antlr.v4.runtime.InputMismatchException
    at org.antlr.v4.runtime.BailErrorStrategy.recoverInline(BailErrorStrategy.java:85)
    ... 4 more

On the one hand, I feel that I'm already doing too much. Looking at how much code I wrote for what should be a simple and common task, I can't help but wonder whether I'm missing some simpler solution. On the other hand, even that doesn't seem enough, for two reasons. Firstly, while I managed to get lexer error reported, they still don't prevent the parser from continuing on the remaining stream. This is evidences by the Parsed, but string for input axb. And secondly, I'm still left with errors which don't get reported to the error listener, as evidenced by the stack trace.

If I don't install the BailErrorStrategy, I get more useful output:

acc => Parsed, but 2-2 l.1: mismatched input 'c' expecting 'a'
axb => Parsed, but 1-1 l.1: token recognition error at: 'x'
abc => Parsed, but 2-2 l.1: extraneous input 'c' expecting {<EOF>, 'a'}

Is there any way to get this kind of error messages but still bail on error? I can see from the sources that the extraneous input message is indeed generated by the DefaultErrorStrategy, apparently after it has worked out how it would go about fixing the issue. Should I let it do that and then bail out, i.e. write my own variant of BailErrorStrategy which calls to super before throwing?

like image 308
MvG Avatar asked Mar 10 '16 18:03

MvG


Video Answer


2 Answers

In the same situation I ended up with extending DefaultErrorStrategy and overriding report* methods. It's pretty straightforward (you can use ANTLRErrorStrategy as well).

Here you can find an example of fail-fast strategy. I think in your case you can collect all errors in the same way and build detailed report.

like image 190
vsminkov Avatar answered Nov 15 '22 15:11

vsminkov


One approach might be modifying the error listener instead of the error strategy. One could use the default strategy together with the following listener:

class ErrorListener extends BaseErrorListener {
    @Override
    public void syntaxError(Recognizer<?, ?> recognizer,
                            Object offendingSymbol,
                            int line,
                            int charPositionInLine,
                            String msg,
                            RecognitionException e) {
        throw new ParseException(msg, e, line);
    }
}

class ParseException extends RuntimeException {
    int line;
    public ParseException(String message, Throwable cause, int line) {
        super(message, cause);
        this.line = line;
    }
}

This way the errors get formatted as they are for output, but the first error to be reported will cause the compile to abort by throwing the named exception. Since this is an unchecked exception, you have to make sure to catch it since the compiler won't warn you if you forget doing so.

With regards to a machine-readable location, if in addition to the line number you also want source text offsets for the offending portion of the input, code like this seems to work inside the syntaxError method:

        int start = 0, stop = -1;
        if (offendingSymbol instanceof Token) {
            Token t = (Token) offendingSymbol;
            start = t.getStartIndex();
            stop = t.getStopIndex();
        } else if (recognizer instanceof Lexer) {
            Lexer lexer = (Lexer)recognizer;
            start = lexer._tokenStartCharIndex;
            stop = lexer._input.index();
        }
like image 26
MvG Avatar answered Nov 15 '22 17:11

MvG