Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to force ANTLR to generate NoViableAltException?

I'm working with antlr 3.2. I have a simple grammar that consists of atoms (which are either the characters "0" or "1"), and a rule which accumulates a comma separated list of them into a list.

When I pass in "00" as input, I don't get an error, which surprises me because this should not be valid input:

C:\Users\dan\workspace\antlrtest\test>java -cp antlr-3.2.jar org.antlr.Tool Test.g
C:\Users\dan\workspace\antlrtest\test>javac -cp antlr-3.2.jar *.java
C:\Users\dan\workspace\antlrtest\test>java -cp .;antlr-3.2.jar TestParser
[0]

How can I force a error to be generated in this case? It's particularly puzzling because when I use the interpreter in ANTLRWorks on this input, it does show a NoViableAltException.

I find that if I change the grammar to require, say, a semicolon at the end, an error is generated, but that solution isn't available to me in the real grammar I am working on.

Here is the grammar, which is self-contained and runnable:

grammar Test;

@parser::members {
  public static void main(String[] args) throws Exception {
    String text = "00";
    ANTLRStringStream in = new ANTLRStringStream(text);
    TestLexer lexer = new TestLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    System.out.println(new TestParser(tokens).mainRule());
  }
}

mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
  :  w=atom {$words.add($w.text);} (',' w=atom {$words.add($w.text);} )*
  ;


atom: '0' | '1';

WS
  :  ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
  ;
like image 333
Dan Becker Avatar asked Nov 06 '22 16:11

Dan Becker


1 Answers

After your mainRule, you should add a EOF token, otherwise ANTLR will stop parsing when there is no valid token to be matched.

Also, the atom rule should really be a lexer rule instead of a parser rule (lexer rules start with a capital).

Try this instead:

grammar Test;

@parser::members {
  public static void main(String[] args) throws Exception {
    String text = "0,1  ,  1  , 0,1";
    ANTLRStringStream in = new ANTLRStringStream(text);
    TestLexer lexer = new TestLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    System.out.println(new TestParser(tokens).mainRule());
  }
}

mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
  :  w=Atom {$words.add($w.text);} (',' w=Atom {$words.add($w.text);} )* EOF
  ;

Atom
  :  '0' | '1'
  ;

WS
  :  ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
  ;

EDIT

To clarify: as you already found out, EOF is not mandatory. It will only cause the parser to go through the entire input. A NoViableAltException is only thrown when the lexer stumbles upon a token/char that is not handled by your lexer grammar. Since you define three tokens in your grammar (0, 1 and ,) and your input, "00", does not contain any characters not handled by your grammar, no NoViableAltException is thrown. If you change your input to something like "0?0", then a NoViableAltException will pop up.

Since your parser finds the first 0 and then did not find a ,, it simply stops parsing since you did not "tell" it to parse all the way to the end of the file.

Hope that clarifies things. If not, let me know.

like image 155
Bart Kiers Avatar answered Nov 23 '22 23:11

Bart Kiers