Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define tokens that can appear in multiple lexical modes in ANTLR4?

Tags:

lexer

antlr4

I am learning ANTLR4 and was trying to play with lexical modes. How can I have the same token appear in multiple lexical modes? As a very simple example, let's say my grammar has two modes, and I want to match white space and end-of-lines in both of them how can I do it without ending with WS_MODE1 and WS_MODE2 for example. Is there a way to reuse the same definition in both cases? I am hoping to get WS tokens in the output stream for all white space irrespective of the mode. The same applies to EOL and other keywords that can appear in both modes.

like image 695
medhat Avatar asked Apr 04 '13 09:04

medhat


People also ask

What is lexer in Antlr?

A lexer (often called a scanner) breaks up an input stream of characters into vocabulary symbols for a parser, which applies a grammatical structure to that symbol stream.

How do you write an Antlr grammar?

Add the package name that you want to see in the Java file in which the lexer and parser files will be created. Add the Language in which you want the output like Java , Python etc. Tick the generate parser tree listener and generate tree visitor if you want to modify the visitor. Now the configuration is done.

What is token in Antlr?

A token is primarily defined via a lexer rule (Lexical rule) Example: the lexical rule LOWERCASE that capture a string of lowercase characters.

Why should a start rule end with EOF end of file in an Antlr grammar?

You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF , it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.


1 Answers

The rules have to have different names, but you can use the -> type(...) lexer command to give them the same type.

WS : [ \t]+;

mode Mode1;

    Mode1_WS : WS -> type(WS);

mode Mode2;

    Mode2_WS : WS -> type(WS);

Even though Mode1_WS and Mode2_WS are not fragment rules, the code generator will see the type command and know that you reassigned their types, so it will not define tokens for them.

like image 56
Sam Harwell Avatar answered Oct 24 '22 09:10

Sam Harwell