Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR4 lexer rule with @init block

Tags:

antlr4

I have this lexer rule defined in my ANTLR v3 grammar file - it maths text in double quotes. I need to convert it to ANTLR v4. ANTLR compiler throws an error 'syntax error: mismatched input '@' expecting COLON while matching a lexer rule' (in @init line). Can lexer rule contain a @init block ? How this should be rewritten ?

DOUBLE_QUOTED_CHARACTERS
@init 
{
   int doubleQuoteMark = input.mark(); 
   int semiColonPos = -1;
}
: ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
{
    RecognitionException re = new RecognitionException("Illegal empty quotes\"\"!", input);
    reportError(re);
}
| '"' (options {greedy=false;}: ~('"'))+ 
  ('"'|';' { semiColonPos = input.index(); } ('\u0020'|'\t')* ('\n'|'\r'))
{ 
    if (semiColonPos >= 0)
    {
        input.rewind(doubleQuoteMark);

        RecognitionException re = new RecognitionException("Missing closing double quote!", input);
        reportError(re);
        input.consume();            
    }
    else
    {
        setText(getText().substring(1, getText().length()-1));
    }
}
; 

Sample data:

  1. " " -> throws error "Illegal empty quotes!";
  2. "asd -> throws error "Missing closing double quote!"
  3. "text" -> returns text (valid input, content of "...")
like image 799
Adrian Avatar asked Nov 11 '22 00:11

Adrian


1 Answers

I think this is the right way to do this.

DOUBLE_QUOTED_CHARACTERS
:
{
   int doubleQuoteMark = input.mark();
   int semiColonPos = -1;
}
(
    ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
    {
        RecognitionException re = new RecognitionException("Illegal empty quotes\"\"!", input);
        reportError(re);
    }
    | '"' (options {greedy=false;}: ~('"'))+
      ('"'|';' { semiColonPos = input.index(); } ('\u0020'|'\t')* ('\n'|'\r'))
    {
        if (semiColonPos >= 0)
        {
            input.rewind(doubleQuoteMark);

            RecognitionException re = new RecognitionException("Missing closing double quote!", input);
            reportError(re);
            input.consume();
        }
        else
        {
            setText(getText().substring(1, getText().length()-1));
        }
    }
)
;

There are some other errors as well in above like WS .. => ... but I am not correcting them as part of this answer. Just to keep things simple. I took hint from here

Just to hedge against that link moving or becoming invalid after sometime, quoting the text as is:

Lexer actions can appear anywhere as of 4.2, not just at the end of the outermost alternative. The lexer executes the actions at the appropriate input position, according to the placement of the action within the rule. To execute a single action for a role that has multiple alternatives, you can enclose the alts in parentheses and put the action afterwards:

END : ('endif'|'end') {System.out.println("found an end");} ;

The action conforms to the syntax of the target language. ANTLR copies the action’s contents into the generated code verbatim; there is no translation of expressions like $x.y as there is in parser actions.

Only actions within the outermost token rule are executed. In other words, if STRING calls ESC_CHAR and ESC_CHAR has an action, that action is not executed when the lexer starts matching in STRING.
like image 154
Tyagi Akhilesh Avatar answered Dec 10 '22 15:12

Tyagi Akhilesh