Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Must access to scala.collection.immutable.List and Vector be synchronized?

Tags:

I'm going through Learning Concurrent Programming in Scala, and encountered the following:

In current versions of Scala, however, certain collections that are deemed immutable, such as List and Vector, cannot be shared without synchronization. Although their external API does not allow you to modify them, they contain non-final fields.

Tip: Even if an object seems immutable, always use proper synchronization to share any object between the threads.

From Learning Concurrent Programming in Scala by Aleksandar Prokopec, end of Chapter 2 (p.58), Packt Publishing, Nov 2014.

Can that be right?

My working assumption has always been that any internal mutability (to implement laziness, caching, whatever) in Scala library data structures described as immutable would be idempotent, such that the worst that might happen in a bad race is work would be unnecessarily duplicated. This author seems to suggest correctness may be imperiled by concurrent access to immutable structures. Is that true? Do we really need to synchronize access to Lists?

Much of my transition to an immutable-heavy style has been motivated by a desire to avoid synchronization and the potential contention overhead it entails. It would be an unhappy big deal to learn that synchronization cannot be eschewed for Scala's core "immutable" data structures. Is this author simply overconservative?

Scala's documentation of collections includes the following:

A collection in package scala.collection.immutable is guaranteed to be immutable for everyone. Such a collection will never change after it is created. Therefore, you can rely on the fact that accessing the same collection value repeatedly at different points in time will always yield a collection with the same elements.

That doesn't quite say that they are safe for concurrent access by multiple threads. Does anyone know of an authoritative statement that they are (or aren't)?

like image 813
Steve Waldman Avatar asked Apr 18 '15 06:04

Steve Waldman


People also ask

Are vectors immutable Scala?

Vector is an immutable data structure in Scala. It was introduced in Scala version 2.8 to address the inefficiency of random access of other existing data structures.

Is list immutable in Scala?

Specific to Scala, a list is a collection which contains immutable data, which means that once the list is created, then it can not be altered. In Scala, the list represents a linked list. In a Scala list, each element need not be of the same data type.

What is a vector in Scala?

Vector is a new collection type in Scala 2.8 that addresses the inefficiency for random access on lists. Vectors allow accessing any element of the list in "effectively" constant time. It's a larger constant than for access to the head of a list or for reading an element of an array, but it's a constant nonetheless.


1 Answers

It depends on where you share them:

  • it's not safe to share them inside scala-library
  • it's not safe to share them with Java-code, reflection

Simply saying, these collections are less protected than objects with only final fields. Regardless that they're same on JVM level (without optimization like ldc) - both may be fields with some mutable address, so you can change them with putfield bytecode command. Anyway, var is still less protected by the compiler, in comparision with java's final, scala's final val and val.

However, it's still fine to use them in most cases as their behaviour is logically immutable - all mutable operations are encapsulated (for Scala-code). Let's look at the Vector. It requires mutable fields to implement appending algorithm:

private var dirty = false

//from VectorPointer
private[immutable] var depth: Int = _
private[immutable] var display0: Array[AnyRef] = _
private[immutable] var display1: Array[AnyRef] = _
private[immutable] var display2: Array[AnyRef] = _
private[immutable] var display3: Array[AnyRef] = _
private[immutable] var display4: Array[AnyRef] = _
private[immutable] var display5: Array[AnyRef] = _

which is implemented like:

val s = new Vector(startIndex, endIndex + 1, blockIndex)
s.initFrom(this) //uses displayN and depth
s.gotoPos(startIndex, startIndex ^ focus) //uses displayN
s.gotoPosWritable //uses dirty
...
s.dirty = dirty

And s comes to the user only after method returned it. So it's not even concern of happens-before guarantees - all mutable operations are performed in the same thread (thread where you call :+, +: or updated), it's just kind of initialization. The only problem here is that private[somePackage] is accessible directly from Java code and from scala-library itself, so if you pass it to some Java's method it could modify them.

I don't think you should worry about thread-safety of let's say cons operator. It also has mutable fields:

final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
  override def tail : List[B] = tl
  override def isEmpty: Boolean = false
}

But they used only inside library methods (inside one-thread) without any explicit sharing or thread creation, and they always return a new collection, let's consider take as an example:

override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {

    val h = new ::(head, Nil)
    var t = h
    var rest = tail
    var i = 1
    while ({if (rest.isEmpty) return this; i < n}) {
      i += 1
      val nx = new ::(rest.head, Nil)
      t.tl = nx //here is mutation of t's filed 
      t = nx
      rest = rest.tail
    }
    h
}

So here t.tl = nx is not much differ from t = nx in the meaning of thread-safety. They both are reffered only from the single stack (take's stack). Althrought, if I add let's say someActor ! t (or any other async operation), someField = t or someFunctionWithExternalSideEffect(t) right inside the while loop - I could break this contract.


A little addtion here about relations with JSR-133:

1) new ::(head, Nil) creates new object in the heap and puts its address (lets say 0x100500) into the stack(val h =)

2) as long as this address is in the stack, it's known only to the current thread

3) Other threads could be involved only after sharing this address by putting it into some field; in case of take it has to flush any caches (to restore the stack and registers) before calling areturn (return h), so returned object will be consistent.

So all operations on 0x100500's object are out of scope of JSR-133 as long as 0x100500 is a part of stack only (not heap, not other's stacks). However, some fields of 0x100500's object may point to some shared objects (which might be in scope JSR-133), but it's not the case here (as these objects are immutable for outside).


I think (hope) the author meant logical synchronization guarantees for library's developers - you still need to be careful with these things if you're developing scala-library, as these vars are private[scala], private[immutable] so, it's possible to write some code to mutate them from different threads. From scala-library developer's perspective, it usually means that all mutations on single instance should be applied in single thread and only on collection that invisible to a user (at the moment). Or, simply saying - don't open mutable fields for outer users in any way.

P.S. Scala had several unexpected issues with synchronization, which caused some parts of the library to be surprisely not thread-safe, so I wouldn't wonder if something may be wrong (and this is a bug then), but in let's say 99% cases for 99% methods immutable collections are thread safe. In worst case you might be pushed from usage of some broken method or just (it might be not just "just" for some cases) need to clone the collection for every thread.

Anyway, immutability is still a good way for thread-safety.

P.S.2 Exotic case which might break immutable collections' thread-safety is using reflection to access their non-final fields.


A little addition about another exotic but really terrifying way, as it pointed out in comments with @Steve Waldman and @axel22 (the author). If you share immutable collection as member of some object shared netween threads && if collection's constructor becomes physically (by JIT) inlined (it's not logically inlined by default) && if your JIT-implementation allows to rearrange inlined code with normal one - then you have to synchronize it (usually is enough to have @volatile). However, IMHO, I don't believe that last condition is a correct behaviour - but for now, can't neither prove nor disprove that.

like image 88
dk14 Avatar answered Sep 18 '22 03:09

dk14