I'm trying out antlr4 with a somewhat large grammar that worked in antlr3. Worked through 2 grammar changes needed and now I have the tool producing the lexer and parser.
However, the lexer has a compile error:
1) The type generates a string that requires more than 65535 bytes to encode in Utf8 format in the constant pool
The error shows up in Eclipse on the class name, so not sure exactly which string it is talking about, but I suspect it is this very long String:
public static final String _serializedATN =
"\1\2\u01c5\u1741\6\uffff\2\0\7\0\2\1\7\1\2\2\7\2\2\3\7\3\2\4\7\4\2\5\7"+
"\5\2\6\7\6\2\7\7\7\2\b\7\b\2\t\7\t\2\n\7\n\2\13\7\13\2\f\7\f\2\r\7\r\2"+
... etc, etc (few hundred lines of unicode)
Looks like a bug in the parser generator, but possible there is some new setting required for antlr4 I'm not aware of (?)
This is really a limitation in Java, not a bug in ANTLR (the correct serialization string is created, but Java's encoding can't store it). Last week we tweaked the _serializedATN
representation to help with this problem, but we have not implemented a complete workaround involving breaking the serialized form into multiple strings or allowing its storage in a separate file loaded at runtime.
There may be some ways to tweak the grammar to reduce the size of the required ATN, but I would need to see the grammar to evaluate that.
Update: Starting with ANTLR 4.1, _serializedATN
is now split as necessary to ensure the constant pool limit is not exceeded in the generated code. See issue 76 for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With