I am trying to understand the following error and I am running in client ode.
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 61186304. To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:300)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Basically I am trying to narrow down the problem.
Is my understanding right that this error is occurring in the spark driver side(i am on aws emr so I believe this will be running on master)?
and I should be looking at spark.driver.memory
?
Kryo is significantly faster and more compact than Java serialization (often as much as 10x), but does not support all Serializable types and requires you to register the classes you'll use in the program in advance for best performance. So it is not used by default because: Not every java. io.
Kryo is a fast and efficient binary object graph serialization framework for Java. The goals of the project are high speed, low size, and an easy to use API. The project is useful any time objects need to be persisted, whether to a file, database, or over the network.
Kryo serialization: Spark can also use the Kryo library (version 4) to serialize objects more quickly.
No, the problem is that kryo does not have enough room in its buffer. You should be adjusting spark.kryoserializer.buffer.max
in your properties file, or use --conf "spark.kryoserializer.buffer.max=128m"
in your spark-submit command. 128m should be big enough for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With