Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Token Aliases in Antlr

Tags:

antlr4

I have rules that look something like this:

INTEGER           : [0-9]+;
field3     : INTEGER COMMA INTEGER;

In the parsed tree I get an List called INTEGER with two elements.

I would rather find a way for each of the elements to be named.

But if I do this:

INTEGER  : [0-9]+;
DOS      : INTEGER;
UNO      : INTEGER;
field3     : UNO COMMA DOS;

I still get the array of INTEGERs.

Am I doing it right and I just need to dig deeper to figure out what is wrong?

Is there some kind of syntax to alias INTEGER as UNO just for this command (that is actually what I would prefer)?

like image 900
Be Kind To New Users Avatar asked May 08 '16 23:05

Be Kind To New Users


People also ask

How does Antlr lexer work?

An ANTLR lexer creates a Token object after matching a lexical rule. Each request for a token starts in Lexer. nextToken , which calls emit once it has identified a token. emit collects information from the current state of the lexer to build the token.

Why use Antlr4?

ANTLR 4 allows you to define lexer and parser rules in a single combined grammar file. This makes it really easy to get started. To get familiar with working with ANTLR, let's take a look at what a simple JSON grammar would look like and break it down.

What is Antlr fragment?

According to the Definitive Antlr4 references book : Rules prefixed with fragment can be called only from other lexer rules; they are not tokens in their own right. actually they'll improve readability of your grammars.


1 Answers

Just use labeling to identify the subterms:

field     : a=INTEGER COMMA b=INTEGER;

The FieldContext class will be generated with two additional class fields:

TerminalNode a;
TerminalNode b;

The corresponding INTEGER instances will be assigned to these fields. So, no aliasing is actually required in most cases.

However, there can be valid reasons to change the named type of a token and typically is handled in the lexer through the use of modes, actions, and predicates. For example, using modes, if INTEGER alternates between UNO and DOS types:

lexer grammar UD ;

UNO : INT -> mode(two);

mode two;
    DOS : INT -> mode(default);

fragment INT : [0-9]+ ;

When to do the mode switch and whether a different specific approach might be more appropriate will depend on details not provided yet.

like image 177
GRosenberg Avatar answered Dec 21 '22 22:12

GRosenberg