Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR : no viable alternative error

I have a task to write simple parser-generator, so I wrote ANTLR-like grammar and tried to parse simple file like "foo:bar;", but got the following output:

[@0,0:2='foo',<1>,1:0]
[@1,3:3=':',<16>,1:3]
[@2,4:6='bar',<1>,1:4]
[@3,7:7=';',<18>,1:7]
[@4,8:7='<EOF>',<-1>,1:8]
line 1:0 no viable alternative at input 'foo'
(rule foo : bar ;)

My grammar looks like

grammar parsGen;

gram : rule SEMICOLON (NEWLINE+ rule SEMICOLON)* ;

rule : lRule | pRule ;

lRule : LRULEID COLON lRule1 ;
lRule1 : (((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;

pRule : PRULEID COLON pRule1 ;
pRule1 : (((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;

modificator : PLUS | ASTERISK | QUESTION ;

ID : LRULEID | PRULEID ;

LRULEID : UPPERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;
PRULEID : LOWERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;

STRING : ('\''.*?'\'') ;
SET : '\''.*?'\'..\''.*?'\'' ;

UPPERLETTER : [A-Z] ;
LOWERLETTER : [a-z] ;
DIGIT : [0-9] ;

NEWLINE : '\r\n'|'\n'|'\r' ;

PLUS : '+' ;
ASTERISK : '*' ;
QUESTION : '?' ;

LBRACE : '(' ;
RBRACE : ')' ;

SPACE : ' ' ;

COLON : ':' ;

PIPE : '|' ;

SEMICOLON : ';' ;

So where could I make a mistake? I tried to search everywhere (google, SO etc.) error "no viable alternative", but it didn't really help me.

like image 861
Yaroslav Skudarnov Avatar asked May 31 '13 12:05

Yaroslav Skudarnov


2 Answers

ANTLR lexers fully assign unambiguous token types before the parser is ever used. When multiple token types can match a token, the first one appearing in the grammar is the one that is used. For your grammar, a token cannot have the type ID and the type LRULEID at the same time. Since the input foo matches both of these lexer rules, the first appearing in the grammar is used so your tokens are: ID, COLON, ID, SEMICOLON, <EOF>.

Since the ID token is never actually referenced in the parser, I suggest one of the following changes. Either of these options will resolve the problem you have described, so the choice is entirely your preference for how the final grammar looks.

Foreword

You need to change the space references from SPACE+ to SPACE*, or the rule will require at least one space character between bar and ;.

Option 1

Remove the ID lexer rule altogether.

Option 2

  1. Change ID to a parser rule so it's not trying to assign token type ID to all of your identifiers.

    id : LRULEID | PRULEID;
    
  2. Update pRule1 rule by referencing id.

    pRule1 : ((id | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;
    

Unrelated Side Note

You grammar might be easier to read if you remove the outermost + closure inside the lRule and pRule1 rules, and instead add them to the rule references themselves, like this. Note that I changed the SPACE references as described in the foreword.

lRule : LRULEID COLON lRule1+ ;
lRule1 : ((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;

pRule : PRULEID COLON pRule1+ ;
pRule1 : ((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;
like image 68
Sam Harwell Avatar answered Oct 25 '22 16:10

Sam Harwell


Also from the http://www.antlr.org/api/Java/org/antlr/v4/runtime/NoViableAltException.html:

Indicates that the parser could not decide which of two or more paths to take based upon the remaining input. It tracks the starting token of the offending input and also knows where the parser was in the various paths when the error [occured].

In my case I was calling lexer.nextToken() before parsing for debugging purposes. That in turn without lexer.reset() was causing no viable alternative at input EOF error.

like image 33
Vitaly Sazanovich Avatar answered Oct 25 '22 17:10

Vitaly Sazanovich