Do you benefit from the Kryo serializer when you use Pyspark?

Question

I read that the Kryo serializer can provide faster serialization when used in Apache Spark. However, I'm using Spark through Python.

Do I still get notable benefits from switching to the Kryo serializer?

eliasah · Accepted Answer

Kryo won’t make a major impact on PySpark because it just stores data as byte[] objects, which are fast to serialize even with Java.

But it may be worth a try — you would just set the spark.serializer configuration and trying not to register any classe.

What might make more impact is storing your data as MEMORY_ONLY_SER and enabling spark.rdd.compress, which will compress them your data.

In Java this can add some CPU overhead, but Python runs quite a bit slower, so it might not matter. It might also speed up computation by reducing GC or letting you cache more data.

Reference : Matei Zaharia's answer in the mailing list.

Do you benefit from the Kryo serializer when you use Pyspark?

Tags:

apache-spark

pyspark

kryo

Gerenuk

1 Answers

eliasah

Recent Activity

Donate For Us

Do you benefit from the Kryo serializer when you use Pyspark?

Tags:

apache-spark

pyspark

kryo

Gerenuk

1 Answers

eliasah

Related questions

Recent Activity

Donate For Us