A week ago I started the following project: a grammar which recognizes suffixes of a Java code.
I used the official ANTLR
grammar for Java (Java.g4
) as a baseline and started to add some rules. However, those new rules also introduced left recursion which I also had to deal with.
After a couple of days of work I had the following code. When I started testing I noticed something unusual which I still can't explain. When given the input { }
the parser tells me no viable alternative at input '<EOF>'
but when I switch the order of the terminals in the right-handed side of the rule s2
, particularly if we change the right-handed side from v2_1 | v2_2 | v2_3 ...
to v2_36 | v2_1 | v2_2 ...
(the terminal v2_36
is moved to the first position), the sequence { }
gets accepted.
My first thoughts were that Antlr
does not backtrack because I noticed that with the input { }
the first version of the parser starts to follow the rule v2_3
and just reports that nothing is found and does not try to consider other options (that's what I think but maybe it's not true) like v2_36
which give exactly the positive answer.
But, after some research, I found out that ANTLR
actually does backtrack but only if everything else fails. At least this is true for v3.3 (read it in official ANTLR
paper) but I guess it's also true for v4
. Now I'm a little bit confused. After spending so many hours on this project I would feel really terrible if I don't make it work. Can someone gives some kind of tip or something? It would be greatly appreciated, thanks.
EDIT
Managed to isolate the problem to
grammar Java;
@parser::members {String ruleName; }
start : compilationUnitSuf EOF;
compilationUnitSuf
: {ruleName = "typeDeclarationSuf"; } s2
;
s2: '{' '}' v2_81 | '{' '}';
v2_81 : {ruleName.equals("enumBodyDeclarationsSuf")}? t173 | t173 '}';
t173: '}' | '{'*;
LBRACKET: '{';
RBRACKET: '}';
WS : [ \t\r\n\u000C]+ -> skip
;
So why does the predicting algorithm suggest me to follow s2 -> v'{' '}' v2_81 -> ...
instead of s2 -> '{' '}'
?
Add the package name that you want to see in the Java file in which the lexer and parser files will be created. Add the Language in which you want the output like Java , Python etc. Tick the generate parser tree listener and generate tree visitor if you want to modify the visitor. Now the configuration is done.
Lexer Rule ActionsAn ANTLR lexer creates a Token object after matching a lexical rule. Each request for a token starts in Lexer. nextToken , which calls emit once it has identified a token. emit collects information from the current state of the lexer to build the token.
ANTLR or ANother Tool for Language Recognition is a lexer and parser generator aimed at building and walking parse trees. It makes it effortless to parse nontrivial text inputs such as a programming language syntax.
I think that you will find that it is not backtracking in the manner that you expect. The reason is that it finds the {}
and then expects to see a v2_181
, which it doesn't find. because it doesn't then backtrack, it doesn't find the alternative that you want. The alternative is to just make the v2_181
optional, then you don't need the backtracking. Something like below:
grammar Java;
@parser::members {String ruleName; }
start : compilationUnitSuf EOF;
compilationUnitSuf
: {ruleName = "typeDeclarationSuf"; } s2
;
s2: '{' '}' v2_81?;
v2_81 : {ruleName.equals("enumBodyDeclarationsSuf")}? t173 | t173 '}';
t173: '}' | '{'*;
LBRACKET: '{';
RBRACKET: '}';
WS : [ \t\r\n\u000C]+ -> skip
;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With