all: I'm trying to write an antlr parser to parse some text, which is formatted like:
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA],
RP PROTEIN SEQUENCE OF 1-22; 2-17;
RP 240-256; 318-339 AND 381-390, AND CHARACTERIZATION.
Basically all lines have a leading 'RP '
to indicate what the line of text is for and the last line should end with a "."
to indicate the ending of this type of lines. Also the text can really be anything. What I need in the end is the text.
I wrote an Antlr grammar for this purpose:
grammar RefLine;
rp_line: RP_HEADER RP_TEXT;
RP_HEADER : 'RP ' -> pushMode(RP_FREE_TEXT_MODE);
mode RP_FREE_TEXT_MODE;
RP_HEADER_SKIP: '\nRP ' -> skip;
RP_TEXT: .+;
DOT_NEWLINE: '.\n' -> popMode;
The idea here is when see the first RP_HEADER, it change to the RP_FREE_TEXT_MODE and thus skip any RP_HEADER in between the lines. And when seeing the DOT_NEWLINE, go back to main mode.
This grammar, however, doesn't compile with Antlr 4.1, producing error:
[ERROR] Message{errorType=MODE_NOT_IN_LEXER, args=[RP_FREE_TEXT_MODE, org.antlr.v4.tool.Grammar@5c0662], e=null, fileName='RefLine.g4', line=7, charPosition=5}
[WARNING] Message{errorType=IMPLICIT_TOKEN_DEFINITION, args=[RP_TEXT], e=null, fileName='RefLine.g4', line=3, charPosition=19}
I don't quite understand why the error is produced. Can anyone explain the correct way of using lexer mode in Antlr? Also, is the TOKEN defined in the mode not available for the parser rule?.
EDIT:
As @auselen suggested, I put the the lexer grammer in a separated file RefLineLex.g4:
lexer grammar RefLineLex;
RP_HEADER : 'RP ' -> pushMode(RP_FREE_TEXT_MODE);
mode RP_FREE_TEXT_MODE;
RP_HEADER_SKIP: '\nRP ' -> skip;
RP_TEXT: .+;
DOT_NEWLINE: '.\n' -> popMode;
And in another Combined grammars RefLine.g4 I have:
grammar RefLine;
import RefLineLex;
rp_line: RP_HEADER RP_TEXT ;
Now Antlr compile file but in the RefLineLexer.java it generated:
private void RP_HEADER_action(RuleContext _localctx, int actionIndex) {
switch (actionIndex) {
case 0: pushMode(RP_FREE_TEXT_MODE); break;
}
}
the constant: RP_FREE_TEXT_MODE
is not defined anywhere in the RefLineLexer.java.
Am I still missing something?
Lexer modes are only available in Lexer grammars and not in compound grammars (Lexer + Parser). See Lexer Rules for some poor documentation and check XML Parser implementation at github for an example.
You should have been able to understand this in very informative errorType=MODE_NOT_IN_LEXER
message in error prints :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With