I am learning ANTLR4 and was trying to play with lexical modes. How can I have the same token appear in multiple lexical modes? As a very simple example, let's say my grammar has two modes, and I want to match white space and end-of-lines in both of them how can I do it without ending with WS_MODE1 and WS_MODE2 for example. Is there a way to reuse the same definition in both cases? I am hoping to get WS tokens in the output stream for all white space irrespective of the mode. The same applies to EOL and other keywords that can appear in both modes.
A lexer (often called a scanner) breaks up an input stream of characters into vocabulary symbols for a parser, which applies a grammatical structure to that symbol stream.
Add the package name that you want to see in the Java file in which the lexer and parser files will be created. Add the Language in which you want the output like Java , Python etc. Tick the generate parser tree listener and generate tree visitor if you want to modify the visitor. Now the configuration is done.
A token is primarily defined via a lexer rule (Lexical rule) Example: the lexical rule LOWERCASE that capture a string of lowercase characters.
You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF , it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.
The rules have to have different names, but you can use the -> type(...)
lexer command to give them the same type.
WS : [ \t]+;
mode Mode1;
Mode1_WS : WS -> type(WS);
mode Mode2;
Mode2_WS : WS -> type(WS);
Even though Mode1_WS
and Mode2_WS
are not fragment
rules, the code generator will see the type
command and know that you reassigned their types, so it will not define tokens for them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With