I'm working with antlr 3.2. I have a simple grammar that consists of atoms (which are either the characters "0" or "1"), and a rule which accumulates a comma separated list of them into a list.
When I pass in "00" as input, I don't get an error, which surprises me because this should not be valid input:
C:\Users\dan\workspace\antlrtest\test>java -cp antlr-3.2.jar org.antlr.Tool Test.g
C:\Users\dan\workspace\antlrtest\test>javac -cp antlr-3.2.jar *.java
C:\Users\dan\workspace\antlrtest\test>java -cp .;antlr-3.2.jar TestParser
[0]
How can I force a error to be generated in this case? It's particularly puzzling because when I use the interpreter in ANTLRWorks on this input, it does show a NoViableAltException.
I find that if I change the grammar to require, say, a semicolon at the end, an error is generated, but that solution isn't available to me in the real grammar I am working on.
Here is the grammar, which is self-contained and runnable:
grammar Test;
@parser::members {
public static void main(String[] args) throws Exception {
String text = "00";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println(new TestParser(tokens).mainRule());
}
}
mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
: w=atom {$words.add($w.text);} (',' w=atom {$words.add($w.text);} )*
;
atom: '0' | '1';
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
;
After your mainRule, you should add a EOF
token, otherwise ANTLR will stop parsing when there is no valid token to be matched.
Also, the atom
rule should really be a lexer rule instead of a parser rule (lexer rules start with a capital).
Try this instead:
grammar Test;
@parser::members {
public static void main(String[] args) throws Exception {
String text = "0,1 , 1 , 0,1";
ANTLRStringStream in = new ANTLRStringStream(text);
TestLexer lexer = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println(new TestParser(tokens).mainRule());
}
}
mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
: w=Atom {$words.add($w.text);} (',' w=Atom {$words.add($w.text);} )* EOF
;
Atom
: '0' | '1'
;
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
;
EDIT
To clarify: as you already found out, EOF
is not mandatory. It will only cause the parser to go through the entire input. A NoViableAltException
is only thrown when the lexer stumbles upon a token/char that is not handled by your lexer grammar. Since you define three tokens in your grammar (0
, 1
and ,
) and your input, "00"
, does not contain any characters not handled by your grammar, no NoViableAltException
is thrown. If you change your input to something like "0?0"
, then a NoViableAltException
will pop up.
Since your parser finds the first 0
and then did not find a ,
, it simply stops parsing since you did not "tell" it to parse all the way to the end of the file.
Hope that clarifies things. If not, let me know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With