When to use Kryo serialization in Spark?

Tags:

I am already compressing RDDs using conf.set("spark.rdd.compress","true") and persist(MEMORY_AND_DISK_SER). Will using Kryo serialization make the program even more efficient, or is it not useful in this case? I know that Kryo is for sending the data between the nodes in a more efficient way. But if the communicated data is already compressed, is it even needed?

785

asked Oct 26 '16 12:10

pythonic

1 Answers

Both of the RDD states you described (compressed and persisted) use serialization. When you persist an RDD, you are serializing it and saving it to disk (in your case, compressing the serialized output as well). You are right that serialization is also used for shuffles (sending data between nodes): any time data needs to leave a JVM, whether it's going to local disk or through the network, it needs to be serialized.

Kryo is a significantly optimized serializer, and performs better than the standard java serializer for just about everything. In your case, you may actually be using Kryo already. You can check your spark configuration parameter:

"spark.serializer" should be "org.apache.spark.serializer.KryoSerializer".

If it's not, then you can set this internally with:

conf.set( "spark.serializer", "org.apache.spark.serializer.KryoSerializer" )

Regarding your last question ("is it even needed?"), it's hard to make a general claim about that. Kryo optimizes one of the slow steps in communicating data, but it's entirely possible that in your use case, others are holding you back. But there's no downside to trying Kryo and benchmarking the difference!

104

answered Oct 11 '22 04:10

Tim

Related questions
                            
                                Sbt-assembly unresolved dependency, Scala
                            
                                Run project with java options via sbt
                            
                                How to access parent element in Scala XML
                            
                                Scala equivalent to C#'s Expression API
                            
                                What does "str" % "str" mean in SBT?
                            
                                Partially applying type parameters
                            
                                Scala collection memory footprint characteristics
                            
                                Is there any usage of implicit functions with several parameters in Scala?
                            
                                Scala: Is there a default class if no class is defined?
                            
                                How to build a simple war file with sbt?
                            
                                scala style - how to avoid having lots of nested map
                            
                                Is destructuring input parameters available in Scala?
                            
                                Is Reactive Programming bounded to Functional programming? [closed]
                            
                                SBT: plugins.sbt in subproject is ignored?
                            
                                Why use curly braces over parentheses?
                            
                                Scala: How to access a class property dynamically by name?
                            
                                Check table existence in slick 3.0
                            
                                What is the right way to work with slick's 3.0.0 streaming results and Postgresql?
                            
                                Finally equivalent in Scala Try [duplicate]
                            
                                Intellij IDEA and SBT syntax error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When to use Kryo serialization in Spark?

Tags:

scala

apache-spark

rdd

kryo

pythonic

People also ask

1 Answers

Tim

Recent Activity

Donate For Us