I have a simple grammar as follows:
grammar SampleConfig;
line: ID (WS)* '=' (WS)* string;
ID: [a-zA-Z]+;
string: '"' (ESC|.)*? '"' ;
ESC : '\\"' | '\\\\' ; // 2-char sequences \" and \\
WS: [ \t]+ -> skip;
The spaces in the input are completely ignored, including those in the string literal.
final String input = "key = \"value with spaces in between\"";
final SampleConfigLexer l = new SampleConfigLexer(new ANTLRInputStream(input));
final SampleConfigParser p = new SampleConfigParser(new CommonTokenStream(l));
final LineContext context = p.line();
System.out.println(context.getChildCount() + ": " + context.getText());
This prints the following output:
3: key="valuewithspacesinbetween"
But, I expected the white spaces in the string literal to be retained, i.e.
3: key="value with spaces in between"
Is it possible to correct the grammar to achieve this behavior or should I just override CommonTokenStream to ignore whitespace during the parsing process?
You shouldn't expect any spaces in parser rules since you're skipping them in your lexer.
Either remove the skip command or make string
a lexer rule:
STRING : '"' ( '\\' [\\"] | ~[\\"\r\n] )* '"';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With