Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference when serializing a lazy val with or without @transient

Working on spark, sometimes I need to send a non-serializable object in each task.

A common pattern is @transient lazy val, e.g

class A(val a: Int)

def compute(rdd: RDD[Int]) = {
  // lazy val instance = {
  @transient lazy val instance = {
    println("in lazy object")
    new A(1)
  }
  val res = rdd.map(instance.a + _).count()
  println(res)
}

compute(sc.makeRDD(1 to 100, 8))

I found that @transient is not necessary here. lazy val can already create the non-serializable upon each task is executed. But people suggest using @transient.

  1. What is the advantage, if we set @transient on the non-initialized lazy val when serializing it ?

  2. Does it make sense to make a non-initialized val transient for serialization, knowing that nothing will be serialized, just like in the example above ?

  3. How is a @transient lazy val serialized ? Is it treated as a method or something else ?

Some details on serializing @transient lazy val and the compiled java bytecode is awesome.

like image 622
Hao Ren Avatar asked Jan 13 '16 14:01

Hao Ren


People also ask

Why are transient variables not serialized?

In case you define any data member as transient, it will not be serialized. This is because every field marked as transient will not be serialized. You can use this transient keyword to indicate the Java virtual machine (JVM) that the transient variable is not part of the persistent state of an object.

What is @transient in Scala?

This is where the @transient lazy val pattern comes in. In Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference. With @transient on the other hand one can denote a field that shall not be serialized.

Which keyword is used to prevent serializing a property when serializing an object?

The transient keyword in Java is used to avoid serialization. If any object of a data structure is defined as a transient , then it will not be serialized.

What does transient mean with respect to serializing objects?

transient is a Java keyword which marks a member variable not to be serialized when it is persisted to streams of bytes. When an object is transferred through the network, the object needs to be 'serialized'. Serialization converts the object state to serial bytes.


1 Answers

see here - http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/

In Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference. With @transient on the other hand one can denote a field that shall not be serialized.

like image 75
David Ahern Avatar answered Sep 19 '22 16:09

David Ahern