Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Kryo serialization buffer overflow error

I am trying to understand the following error and I am running in client ode.

 org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 61186304. To avoid this, increase spark.kryoserializer.buffer.max value.
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:300)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Basically I am trying to narrow down the problem. Is my understanding right that this error is occurring in the spark driver side(i am on aws emr so I believe this will be running on master)? and I should be looking at spark.driver.memory ?

like image 625
soupybionics Avatar asked Apr 01 '18 04:04

soupybionics


People also ask

Why is serialization faster than KRYO serialization?

Kryo is significantly faster and more compact than Java serialization (often as much as 10x), but does not support all Serializable types and requires you to register the classes you'll use in the program in advance for best performance. So it is not used by default because: Not every java. io.

Why is KRYO serialized?

Kryo is a fast and efficient binary object graph serialization framework for Java. The goals of the project are high speed, low size, and an easy to use API. The project is useful any time objects need to be persisted, whether to a file, database, or over the network.

Can we use KRYO serializer in Pyspark?

Kryo serialization: Spark can also use the Kryo library (version 4) to serialize objects more quickly.


1 Answers

No, the problem is that kryo does not have enough room in its buffer. You should be adjusting spark.kryoserializer.buffer.max in your properties file, or use --conf "spark.kryoserializer.buffer.max=128m" in your spark-submit command. 128m should be big enough for you.

like image 191
Mike Pone Avatar answered Oct 12 '22 22:10

Mike Pone