Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: Parallel collection in object initializer causes a program to hang

I've just noticed a disturbing behavior. Let's say I have a standalone program consisting of a sole object:

object ParCollectionInInitializerTest {
  def doSomething { println("Doing something") }

  for (i <- (1 to 2).par) {
    println("Inside loop: " + i)
    doSomething
  }

  def main(args: Array[String]) {
  }
}

The program is perfectly innocent and, when the range used in for loop is not a parallel one, executes properly, with the following output:

Inside loop: 1
Doing something
Inside loop: 2
Doing something

Unfortunately, when using the parallel collection, the program just hangs without ever invoking the doSomething method, so the output is as follows:

Inside loop: 2
Inside loop: 1

And then the program hangs.
Is this just a nasty bug? I'm using scala-2.10.

like image 464
Lukasz Gieron Avatar asked Mar 02 '13 15:03

Lukasz Gieron


1 Answers

This is an inherent problem which can happen in Scala when releasing a reference to the singleton object before the construction is complete. It happens due to a different thread trying to access the object ParCollectionInInitializerTest before it has been fully constructed. It has nothing to do with the main method, rather, it has to do with initializing the object that contains the main method -- try running this in the REPL, typing in the expression ParCollectionInInitializerTest and you'll get the same results. It also doesn't have anything to do with fork-join worker threads being daemon threads.

Singleton objects are initialized lazily. Every singleton object can be initialized only once. That means that the first thread that accesses the object (in your case, the main thread) must grab a lock of the object, and then initialize it. Every other thread that comes subsequently must wait for the main thread to initialize the object and eventually release the lock. This is the way singleton objects are implemented in Scala.

In your case the parallel collection worker thread tries accessing the singleton object to invoke doSomething, but cannot do so until the main thread completes initializing the object -- so it waits. On the other hand, the main thread waits in the constructor until the parallel operation completes, which is conditional upon all the worker threads completing -- the main thread holds the initialization lock for the singleton all the time. Hence, a deadlock occurs.

You can cause this behaviour with futures from 2.10, or with mere threads, as shown below:

def execute(body: =>Unit) {
  val t = new Thread() {
    override def run() {
      body
    }
  }

  t.start()
  t.join()
}

object ParCollection {

  def doSomething() { println("Doing something") }

  execute {
    doSomething()
  }

}

Paste this into the REPL, and then write:

scala> ParCollection

and the REPL hangs.

like image 62
axel22 Avatar answered Sep 29 '22 14:09

axel22