Working on spark, sometimes I need to send a non-serializable object in each task.
A common pattern is @transient lazy val
, e.g
class A(val a: Int)
def compute(rdd: RDD[Int]) = {
// lazy val instance = {
@transient lazy val instance = {
println("in lazy object")
new A(1)
}
val res = rdd.map(instance.a + _).count()
println(res)
}
compute(sc.makeRDD(1 to 100, 8))
I found that @transient
is not necessary here. lazy val
can already create the non-serializable upon each task is executed. But people suggest using @transient
.
What is the advantage, if we set @transient
on the non-initialized lazy val
when serializing it ?
Does it make sense to make a non-initialized val
transient for serialization, knowing that nothing will be serialized, just like in the example above ?
How is a @transient lazy val
serialized ? Is it treated as a method or something else ?
Some details on serializing @transient lazy val
and the compiled java bytecode is awesome.
In case you define any data member as transient, it will not be serialized. This is because every field marked as transient will not be serialized. You can use this transient keyword to indicate the Java virtual machine (JVM) that the transient variable is not part of the persistent state of an object.
This is where the @transient lazy val pattern comes in. In Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference. With @transient on the other hand one can denote a field that shall not be serialized.
The transient keyword in Java is used to avoid serialization. If any object of a data structure is defined as a transient , then it will not be serialized.
transient is a Java keyword which marks a member variable not to be serialized when it is persisted to streams of bytes. When an object is transferred through the network, the object needs to be 'serialized'. Serialization converts the object state to serial bytes.
see here - http://fdahms.com/2015/10/14/scala-and-the-transient-lazy-val-pattern/
In Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference. With @transient on the other hand one can denote a field that shall not be serialized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With