Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The type generates a string that requires more than 65535 bytes to encode in Utf8 format in the constant pool

Tags:

antlr4

I'm trying out antlr4 with a somewhat large grammar that worked in antlr3. Worked through 2 grammar changes needed and now I have the tool producing the lexer and parser.

However, the lexer has a compile error:

1) The type generates a string that requires more than 65535 bytes to encode in Utf8 format in the constant pool

The error shows up in Eclipse on the class name, so not sure exactly which string it is talking about, but I suspect it is this very long String:

    public static final String _serializedATN =
        "\1\2\u01c5\u1741\6\uffff\2\0\7\0\2\1\7\1\2\2\7\2\2\3\7\3\2\4\7\4\2\5\7"+
        "\5\2\6\7\6\2\7\7\7\2\b\7\b\2\t\7\t\2\n\7\n\2\13\7\13\2\f\7\f\2\r\7\r\2"+
... etc, etc (few hundred lines of unicode)

Looks like a bug in the parser generator, but possible there is some new setting required for antlr4 I'm not aware of (?)

like image 975
joehitt Avatar asked Jan 17 '13 14:01

joehitt


1 Answers

This is really a limitation in Java, not a bug in ANTLR (the correct serialization string is created, but Java's encoding can't store it). Last week we tweaked the _serializedATN representation to help with this problem, but we have not implemented a complete workaround involving breaking the serialized form into multiple strings or allowing its storage in a separate file loaded at runtime.

There may be some ways to tweak the grammar to reduce the size of the required ATN, but I would need to see the grammar to evaluate that.

Update: Starting with ANTLR 4.1, _serializedATN is now split as necessary to ensure the constant pool limit is not exceeded in the generated code. See issue 76 for details.

like image 69
Sam Harwell Avatar answered Sep 18 '22 04:09

Sam Harwell