How can my ANTLR lexer match a token made of characters that are subset of another kind of token?

Question

I have what I think is a simple ANTLR question. I have two token types: ident and special_ident. I want my special_ident to match a single letter followed by a single digit. I want the generic ident to match a single letter, optionally followed by any number of letters or digits. My (incorrect) grammar is below:

expr 
    : special_ident
    | ident
    ;

special_ident : LETTER DIGIT;
ident         : LETTER (LETTER | DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';

When I try to check this grammar, I get this warning:

Decision can match input such as "LETTER DIGIT" using multiple alternatives: 1, 2. As a result, alternative(s) 2 were disabled for that input

I understand that my grammar is ambiguous and that input such as A1 could match either ident or special_ident. I really just want the special_ident to be used in the narrowest of cases.

Here's some sample input and what I'd like it to match:

A      : ident
A1     : special_ident
A1A    : ident
A12    : ident
AA1    : ident

How can I form my grammar such that I correctly identify my two types of identifiers?

Carl Smotricz · Accepted Answer

Seems that you have 3 cases:

A
AN
A(A|N)(A|N)+

You could classify the middle one as special_ident and the other two as ident; seems that should do the trick.

I'm a bit rusty with ANTLR, I hope this hint is enough. I can try to write out the expressions for you but they could be wrong:

long_ident    : LETTER (LETTER | DIGIT) (LETTER | DIGIT)+
special_ident : LETTER DIGIT;
ident         : LETTER | long_ident;

WayneH · Answer

Expanding on Carl's thought, I would guess you have four different cases:

A
AN
AA(A|N)*
AN(A|N)+

Only option 2 should be token special_ident and the other three should be ident. All tokens can be identified by syntax alone. Here is a quick grammar I was able to test in ANTLRWorks and it appeared to work properly for me. I think Carl's might have one bug when trying to check AA , but getting you 99% there is a huge benefit, so this is only a minor modification to his quick thought.

prog 
    :    (expr WS)+ EOF;

expr 
    : special_ident {System.out.println("Found special_ident:" + $special_ident.text + "
");}
    | ident {System.out.println("Found ident:" + $ident.text + "
");}
    ;

special_ident : LETTER DIGIT;

ident         : LETTER 
    |LETTER DIGIT (LETTER|DIGIT)+
    |LETTER LETTER (LETTER|DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';
WS 
    :   (' '|'	'|'
'|'
')+;

How can my ANTLR lexer match a token made of characters that are subset of another kind of token?

Tags:

grammar

antlr

Chris Farmer

2 Answers

Carl Smotricz

WayneH

Recent Activity

Donate For Us

How can my ANTLR lexer match a token made of characters that are subset of another kind of token?

Tags:

grammar

antlr

Chris Farmer

2 Answers

Carl Smotricz

WayneH

Related questions

Recent Activity

Donate For Us