Here is what I am trying to make an AST of it:
{{ name }}
{{ name | option }}
{{ name | option1 | option2 }}
{{ name | key=value }}
{{ name | option1 | key=value }}
{{ name | option1 | {{ another }} | option3 }}
So in practice there is always a name (a..zA..Z0..9) and options sometimes are in key-value format and sometimes in simple and without value format.
I am trying to write a lexer/parser grammar for it by ANTLR but it keeps nagging about different stuff. Here is my best shot:
start : box+;
box : '{{' Name ('|' Options )* '}}';
Options : (SimpleOption | KeyValue | box);
Name : ID;
SimpleOption: ID;
KeyValue: ID '=' ID;
fragment
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
WS : ( ' ' | '\t' | '\r' | '\n' {$channel=HIDDEN;} ;
Which is obviously wrong because Name and SimpleOption are ambiguous. Even an inline rule is useless:
box : '{{' Name ('|' (ID | KeyValue | box) )* '}}';
Because it never picks KeyValue up and gives a Mismatch exception on the encounter with '='.
How would you write this grammar?
You're using way too much lexer rules. The rule KeyValue will only match ID '=' ID without spaces around the = sign: it should be a parser rule (start with a lower case letter). Only when it's a parser rule, it can have spaces around the =, which will get discarded then.
Be sure you understand the difference between lexer- and parser rules! See: Practical difference between parser rules and lexer rules in ANTLR?
This should do it:
grammar T;
start : box+ EOF;
box : '{{' ID ('|' opts)* '}}';
opts : key_value | ID | box; // note that 'options' is a reserved word in ANTLR!
key_value : ID '=' ID;
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*;
WS : (' ' | '\t' | '\r' | '\n') {skip();};
which would parse the input
{{ name | option1 = value1 | {{ another | k=v }} | option3 }}
as follows:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With