Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are Scala 2.9 parallel collections working behind the scenes?

Scala 2.9 introduced parallel collections. They are a really great tool for certain tasks. However, how do they work internally and am I able to influence the behavior/configuration?

What method do they use to figure out the optimal number of threads? If I am not satisfied with the result are there any configuration parameters to adjust?

I'm not only interested how many threads are actually created, I am also interested in the way how the actual work is distributed amongst them. How the results are collected and how much magic is going on behind the scenes. Does Scala somehow test if a collection is large enough to benefit from parallel processing?

like image 441
Steffen Avatar asked Jun 13 '11 10:06

Steffen


1 Answers

Briefly, there are two orthogonal aspects to how your operations are parallelized:

  1. The extent to which your collection is split into chunks (i.e. the size of the chunks) for a parallelizable operation (such as map or filter)
  2. The number of threads to use for the underlying fork-join pool (on which the parallel tasks are executed)

For #2, this is managed by the pool itself, which discovers the "ideal" level of parallelism at runtime (see java.lang.Runtime.getRuntime.availableProcessors)

For #1, this is a separate problem and the scala parallel collections API does this via the concept of work-stealing (adaptive scheduling). That is, when a particular piece of work is done, a worker will attempt to steal work from other work-queues. If none is available, this is an indication that all of the processors are very busy and hence a bigger chunk of work should be taken.

Aleksandar Prokopec, who implemented the library gave a talk at this year's ScalaDays which will be online shortly. He also gave a great talk at ScalaDays2010 where he describes in detail how the operations are split and re-joined (there are a number of issues that are not immediately obvious and some lovely bits of cleverness in there too!).

A more comprehensive answer is available in the PDF describing the parallel collections API.

like image 53
oxbow_lakes Avatar answered Oct 14 '22 18:10

oxbow_lakes