Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Kryo serializer allocates buffer in Spark

Tags:

Please help to understand how Kryo serializer allocates memory for its buffer.

My Spark app fails on a collect step when it tries to collect about 122Mb of data to a driver from workers.

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 57197
    at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18)
    at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:312)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
    at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
    at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

This exception is shown after I've increased the driver memory to 3Gb and executor memory to 4Gb and increased buffer size for kryoserializer (I'm using Spark 1.3)

conf.set('spark.kryoserializer.buffer.mb', '256')
conf.set('spark.kryoserializer.buffer.max', '512')

I think I've set buffer to be big enough, but my spark app keeps crashing. How can I check what objects are using Kryo buffer on a executor? Is there way to clean it up?

like image 325
vvladymyrov Avatar asked Aug 11 '15 16:08

vvladymyrov


Video Answer


1 Answers

In my case, the problem was using the wrong property name for the max buffer size.

Up to Spark version 1.3 the property name is spark.kryoserializer.buffer.max.mb - it has ".mb" in the end. But I used property name from Spark 1.4 docs - spark.kryoserializer.buffer.max .

As a result spark app was using the default value - 64mb. And it was not enough for the amount of data I was processing.

After I fixed the property name to spark.kryoserializer.buffer.max.mb my app worked fine.

like image 53
vvladymyrov Avatar answered Sep 28 '22 06:09

vvladymyrov