Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala: parallel collections not working?

Tags:

scala

i'm trying to usage parallel collections in a very basic way via .par - i expect the collection to be acted on out of order, but that doesn't seem the case:

scala> (1 to 10) map println
1
2
3
4
5
6
7
8
9
10

and

scala> (1 to 10).par map println
1
2
3
4
5
6
7
8
9
10

seems like the order shouldn't be sequential in the latter case. this is with scala 2.9, my machine has 2 cores. is this perhaps a misconfiguration somewhere? thanks!

edit: i did indeed try running with a large set (100k) and the result was still sequential.

like image 619
Heinrich Schmetterling Avatar asked Jun 03 '11 01:06

Heinrich Schmetterling


1 Answers

YMMV:

scala> (1 to 10).par map println
1
6
2
3
4
7
5
8
9

This is on a dual core too...

I think if you try enough run you may see different results. Here is a piece of code that shows some of what happens:

import collection.parallel._
import collection.parallel.immutable._

class ParRangeEx(range: Range) extends ParRange(range) {
  // Some minimal number of elements after which this collection 
  // should be handled sequentially by different processors.
  override def threshold(sz: Int, p:Int) = {
    val res = super.threshold(sz, p)
    printf("threshold(%d, %d) returned %d\n", sz, p, res)
    res
  }
  override def splitter = {
    new ParRangeIterator(range) 
        with SignalContextPassingIterator[ParRangeIterator] {
      override def split: Seq[ParRangeIterator] = {
        val res = super.split
        println("split " + res) // probably doesn't show further splits
        res
      }
    }
  }
}

new ParRangeEx((1 to 10)).par map println

Some runs I get interspersed processing, some runs I get sequential processing. It seems to split the load in two. If you change the returned threshold number to 11, you'll see that the workload will never be split.

The underlying scheduling mechanism is based on fork-join and work stealing. See the following JSR166 source code for some insights. This is probably what drives whether the same thread will pick up both tasks (and thus seems sequential) or two threads work on each task.

Here is an example output on my computer:

threshold(10, 2) returned 1
split List(ParRangeIterator(over: Range(1, 2, 3, 4, 5)), 
  ParRangeIterator(over: Range(6, 7, 8, 9, 10)))
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
6
7
threshold(10, 2) returned 1
8
1
9
2
10
3
4
5
like image 134
huynhjl Avatar answered Sep 28 '22 18:09

huynhjl