I've very simple grammar which tries to match 'é' to token E_CODE.
I've tested it using TestRig tool (with -tokens option), but parser can't correctly match it.
My input file was encoded in UTF-8 without BOM and I've used ANTLR version 4.4.
Could somebody else also check this ? I got this output on my console:
line 1:0 token recognition error at: 'Ă'
grammar Unicode;
stat:EOF;
E_CODE: '\u00E9' | 'é';
I tested the grammar:
grammar Unicode;
stat: E_CODE* EOF;
E_CODE: '\u00E9' | 'é';
as follows:
UnicodeLexer lexer = new UnicodeLexer(new ANTLRInputStream("\u00E9é"));
UnicodeParser parser = new UnicodeParser(new CommonTokenStream(lexer));
System.out.println(parser.stat().getText());
and the following got printed to my console:
éé<EOF>
Tested with 4.2 and 4.3 (4.4 isn't in Maven Central yet).
Looking at the source I see TestRig takes an optional -encoding param. Have you tried setting it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With