Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Range element cannot be used in parser rule?

Tags:

antlr

I have the following grammar:

grammar tryout;

tryout :  my_cmd
        ;

my_cmd
    : 'start'   '0'..'9'+  Name_string
    ;

Digit
    : '0'..'9'
    ;

Name_string
    : ('A'..'Z' | 'a'..'z')  ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')*
    ;

If I see the diagram in ANTLRworks, '0'..'9'+ shows as an empty element and so Java code compilation fails because the generated code has "if ()" statement; if I run at command line, compilation also fails.

The fix is to move '0'..'9'+ to a lexer rule.

grammar tryout;

tryout :  my_cmd
        ;

my_cmd
    : 'start'   Digit+  Name_string
    ;

Digit
    : '0'..'9'
    ;

Name_string
    : ('A'..'Z' | 'a'..'z')  ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')*
    ;

But I wonder if this is a bug. Why the range element cannot be used in parser rule? This is on ANTLR v3.4.

like image 906
my_question Avatar asked Nov 13 '22 15:11

my_question


1 Answers

Inside parser rules, .. does not function as a range operator for characters, as it does inside lexer rules. Also note that even though you defined literals inside parser rules, ANTLR created lexer rules for them on the fly, making the following:

my_cmd
 : 'start' '0'..'9'+ Name_string
 ;

equivalent to:

my_cmd
 : Start D0..D9+ Name_string
 ;

Start : 'start';
D0    : '0';
D9    : '9';

If memory serves me, earlier versions of ANTLR v3 supporter the range operator inside parser rules to mean: match any token between D0 and D9, but this is/was extremely fragile. Adding a rule between D0 and D9 would change the meaning of it:

D0    : '0';
FOO   : 'foo';
D9    : '9';

The parser rule:

my_cmd
 : '0'..'9'+
 ;

would now match one of the following tokens: D0, FOO or D9.

This .. support inside parser rules has been removed from (at least) v3.3 and up. So, don't use .. inside parser rules.

like image 61
Bart Kiers Avatar answered Dec 15 '22 00:12

Bart Kiers