In one of his videos (concerning Scala's lazy evaluation, namely lazy
keyword), Martin Odersky shows the following implementation of cons
operation used to construct a Stream
:
def cons[T](hd: T, tl: => Stream[T]) = new Stream[T] {
def head = hd
lazy val tail = tl
...
}
So tail
operation is written concisely using lazy evaluation feature of the language.
But in reality (in Scala 2.11.7), the implementation of tail
is a bit less elegant:
@volatile private[this] var tlVal: Stream[A] = _
@volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
if (!tailDefined)
synchronized {
if (!tailDefined) {
tlVal = tlGen()
tlGen = null
}
}
tlVal
}
Double-checked locking and two volatile fields: that's roughly how you would implement a thread-safe lazy computation in Java.
So the questions are:
lazy
keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?tail
implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?Doesn't lazy keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?
Yes, it does, as others have stated.
Is the pattern used in real tail implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?
I think I have the actual answer as to why not lazy val
. Stream
has public facing API methods such as hasDefinitionSize
inherited from TraversableOnce
. In order to know if a Stream
has a finite size not, we need a way of checking without materializing the underlying Stream
tail. Since lazy val
doesn't actually expose the underlying bit, we can't do that.
This is backed by SI-1220
To strengthen this point, @Jasper-M points out that the new LazyList
api in strawman (Scala 2.13 collection makeover) no longer has this issue, since the entire collection hierarchy has been reworked and there are no longer such concerns.
I would say "it depends" on which angle you're looking at this problem. From a LOB point of view, I'd say definitely go with lazy val
for conciseness and clarity of implementation. But, if you look at it from the point of view of a Scala collections library author, things start to look differently. Think of it this way, you're creating a library which will be potentially be used by many people and ran on many machines across the world. This means that you should be thinking of the memory overhead of each structure, especially if you're creating such an essential data structure yourself.
I say this because when you use lazy val
, by design you generate an additional Boolean
field which flags if the value has been initialized, and I am assuming this is what the library authors were aiming to avoid. The size of a Boolean
on the JVM is of course VM dependent, by even a byte is something to consider, especially when people are generating large Stream
s of data. Again, this is definitely not something I would usually consider and is definitely a micro optimization towards memory usage.
The reason I think performance is one of the key points here is SI-7266 which fixes a memory leak in Stream. Note how it is of importance to track the byte code to make sure no extra values are retained inside the generated class.
The difference in the implementation is that the definition of tail
being initialized or not is a method implementation which checks the generator:
def tailDefined: Boolean = tlGen eq null
Instead of a field on the class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With