Spark SQL fails because "Constant pool has grown past JVM limit of 0xFFFF"

Question

I am running this code on EMR 4.6.0 + Spark 1.6.1 :

val sqlContext = SQLContext.getOrCreate(sc)
val inputRDD = sqlContext.read.json(input)

try {
    inputRDD.filter("`first_field` is not null OR `second_field` is not null").toJSON.coalesce(10).saveAsTextFile(output)
    logger.info("DONE!")
} catch {
    case e : Throwable => logger.error("ERROR" + e.getMessage)
}

In the last stage of saveAsTextFile, it fails with this error:

16/07/15 08:27:45 ERROR codegen.GenerateUnsafeProjection: failed to compile: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM limit of 0xFFFF
/* 001 */
/* 002 */ public java.lang.Object generate(org.apache.spark.sql.catalyst.expressions.Expression[] exprs) {
/* 003 */   return new SpecificUnsafeProjection(exprs);
/* 004 */ }
(...)

What could be the reason? Thanks

Nhan Trinh · Accepted Answer

Solved this problem by dropping all the unused column in the Dataframe, or just filter columns you actually need.

Turnes out Spark Dataframe can not handle super wide schemas. There is no specific number of columns where Spark might break with “Constant pool has grown past JVM limit of 0xFFFF” - it depends on kind of query, but reducing number of columns can help to workaround this issue.

The underlying root cause is in JVM's 64kb for generated Java classes - see also Andrew's answer.

Andrew · Answer

This is due to known limitation of Java for generated classes to go beyond 64Kb.

This limitation has been worked around in SPARK-18016 which is fixed in Spark 2.3 - will be released in Jan/2018.

Uri Goren · Answer

For future reference, this issue was fixed in spark 2.3 (As Andrew noted).

If you encounter this issue on Amazon EMR, upgrade to release version 5.13 or above.

Spark SQL fails because "Constant pool has grown past JVM limit of 0xFFFF"

Tags:

java

scala

apache-spark

amazon-emr

Nhan Trinh

3 Answers

Nhan Trinh

Andrew

Uri Goren

Recent Activity

Donate For Us

Spark SQL fails because "Constant pool has grown past JVM limit of 0xFFFF"

Tags:

java

scala

apache-spark

amazon-emr

Nhan Trinh

3 Answers

Nhan Trinh

Andrew

Uri Goren

Related questions

Recent Activity

Donate For Us