Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Stream tail laziness and synchronization

In one of his videos (concerning Scala's lazy evaluation, namely lazy keyword), Martin Odersky shows the following implementation of cons operation used to construct a Stream:

def cons[T](hd: T, tl: => Stream[T]) = new Stream[T] {
  def head = hd
  lazy val tail = tl
  ...
}

So tail operation is written concisely using lazy evaluation feature of the language.

But in reality (in Scala 2.11.7), the implementation of tail is a bit less elegant:

@volatile private[this] var tlVal: Stream[A] = _
@volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
  if (!tailDefined)
    synchronized {
      if (!tailDefined) {
        tlVal = tlGen()
        tlGen = null
      }
    }

  tlVal
}

Double-checked locking and two volatile fields: that's roughly how you would implement a thread-safe lazy computation in Java.

So the questions are:

  1. Doesn't lazy keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?
  2. Is the pattern used in real tail implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?
like image 728
Roman Puchkovskiy Avatar asked Jan 05 '18 10:01

Roman Puchkovskiy


1 Answers

Doesn't lazy keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?

Yes, it does, as others have stated.

Is the pattern used in real tail implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?

Edit:

I think I have the actual answer as to why not lazy val. Stream has public facing API methods such as hasDefinitionSize inherited from TraversableOnce. In order to know if a Stream has a finite size not, we need a way of checking without materializing the underlying Stream tail. Since lazy val doesn't actually expose the underlying bit, we can't do that.

This is backed by SI-1220

To strengthen this point, @Jasper-M points out that the new LazyList api in strawman (Scala 2.13 collection makeover) no longer has this issue, since the entire collection hierarchy has been reworked and there are no longer such concerns.


Performance related concerns

I would say "it depends" on which angle you're looking at this problem. From a LOB point of view, I'd say definitely go with lazy val for conciseness and clarity of implementation. But, if you look at it from the point of view of a Scala collections library author, things start to look differently. Think of it this way, you're creating a library which will be potentially be used by many people and ran on many machines across the world. This means that you should be thinking of the memory overhead of each structure, especially if you're creating such an essential data structure yourself.

I say this because when you use lazy val, by design you generate an additional Boolean field which flags if the value has been initialized, and I am assuming this is what the library authors were aiming to avoid. The size of a Boolean on the JVM is of course VM dependent, by even a byte is something to consider, especially when people are generating large Streams of data. Again, this is definitely not something I would usually consider and is definitely a micro optimization towards memory usage.

The reason I think performance is one of the key points here is SI-7266 which fixes a memory leak in Stream. Note how it is of importance to track the byte code to make sure no extra values are retained inside the generated class.

The difference in the implementation is that the definition of tail being initialized or not is a method implementation which checks the generator:

def tailDefined: Boolean = tlGen eq null

Instead of a field on the class.

like image 188
Yuval Itzchakov Avatar answered Sep 28 '22 03:09

Yuval Itzchakov