Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I replace the fork join pool for a Scala 2.9 parallel collection?

I've been looking at the new Scala 2.9 parallel collections and am hoping to abandon a whole lot of my crufty amateur versions of similar things. In particular, I'd like to replace the fork join pool which underlies the default implementation with something of my own (for example, something that distributes evaluation of tasks across a network, via actors). My understanding is that this is simply a matter of applying Scala's paradigm of "stackable modifications", but the collections library is intimidating enough that I'm not exactly sure which bits need modifying!

Some concrete questions:

  1. Is it correct that the standard parallel implementations interact with the fork join pool solely through the code in ForkJoinTasks?
  2. I see that there's an alternative trait, FutureThreadPoolTasks. How would I build a collection which uses this trait instead of ForkJoinTasks?
  3. Can I just write yet another alternative (and perhaps a corresponding boilerplate class that mixes in AdaptiveWorkStealingTasks and somehow instantiate collections instances that use this new trait?

(For reference, all of the traits mentioned above are defined in Tasks.scala.)

Especially code examples are very welcome!

like image 963
Scott Morrison Avatar asked May 18 '11 02:05

Scott Morrison


People also ask

What is a fork join pool?

ForkJoinPool It is an implementation of the ExecutorService that manages worker threads and provides us with tools to get information about the thread pool state and performance. Worker threads can execute only one task at a time, but the ForkJoinPool doesn't create a separate thread for every single subtask.

What is parallel collection in Scala?

The design of Scala's parallel collections library is inspired by and deeply integrated with Scala's (sequential) collections library (introduced in 2.8). It provides a parallel counterpart to a number of important data structures from Scala's (sequential) collection library, including: ParArray. ParVector. mutable.


2 Answers

Just to provide some more information on how things fit together (which I suspect you already know): the fork-join pool is "plugged in" via the parallel package object's tasksupport value which implements the scala.collection.parallel.TaskSupport trait.

This, in turn, inherits from Tasks (which you mention) and defines such operations as:

def execute[R, Tp](fjtask: Task[R, Tp]): () => R

def executeAndWaitResult[R, Tp](task: Task[R, Tp]): R

However, it's not immediately obvious to me how you can override the behaviour which is explicitly imported by the collections themselves by supplying your own TaskSupport implementation. For example, in ParSeqLike line 47:

import tasksupport._

In fact,I would go so far as saying it looks like the parallelism is definitively not overridable (unless I am very much mistaken, though I often am).

like image 122
oxbow_lakes Avatar answered Oct 21 '22 08:10

oxbow_lakes


Here is a document describing how to switch TaskSupport objects in Scala 2.10.

like image 31
axel22 Avatar answered Oct 21 '22 09:10

axel22