Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Context-sensitive whitespace handling in ANTLR4

I'm trying to implement an expression/formula language in ANTLR4 and having a problem with whitespace handling. In most cases I don't care about whitespace, so I have the "standard" lexer rule to send it to the HIDDEN channel, i.e.

// Whitespace
WS
    :   ( ' ' | '\t' |'\r' | '\n' ) -> channel(HIDDEN)
    ;

However I have one operator which doesn't allow whitespace either before or after, and I can't see how to handle the situation without changing the WS lexer rule to leave the whitespace in the default channel and having explicit WS? terms in all of my other parser rules (there are quite a lot of them).

As simplified example, I created the following grammar for an imaginary predicate language:

grammar Logik;

/*
 * Parser Rules
 */

ruleExpression
    :   orExpression
    ;

orExpression
    :   andExpression ( 'OR' andExpression)*
    ;

andExpression
    :   primaryExpression ( 'AND' primaryExpression)*
    ;

primaryExpression
    :   variableExpression
    |   '(' ruleExpression ')'
    ;

variableExpression
    :   IDENTIFIER ( '.' IDENTIFIER )*
    ;

/*
 * Lexer Rules
 */

IDENTIFIER
    :   LETTER LETTERORDIGIT*
    ;

fragment LETTER : [a-zA-Z_];
fragment LETTERORDIGIT : [a-zA-Z0-9_];

// Whitespace
WS
    :   ( ' ' | '\t' |'\r' | '\n' ) -> channel(HIDDEN)
    ;

As it stands, this parses A OR B AND C.D and A OR B AND C. D successfully - what I need is for the . operator to not allow whitespace, so that the second expression isn't valid.

like image 838
Jim Paton Avatar asked Mar 12 '15 12:03

Jim Paton


1 Answers

You can get the token from other channels like this:

variableExpression
  :   IDENTIFIER ( '.' {_input.get(_input.index() -1).getType() != WS}? IDENTIFIER )*
  ;

A OR B AND C.D is OK and

A OR B AND C. D will print an error

like image 93
CoronA Avatar answered Sep 30 '22 09:09

CoronA