I can run scala's foreach in parallel like that: <pre class="prettyprint"><code>val N = 100 (0 until N).par.foreach(i => { // do something }) </code></pre> But how can I set thread number? I want something like that: <pre class="prettyprint"><code>val N = 100 val NThreads = 5 (0 until N).par.foreach(NThreads, i => { // do something }) </code></pre>

Official Scala documentation provides a way to change the task support of a parallel collection like this: <pre class="prettyprint"><code>import scala.collection.parallel._ val pc = mutable.ParArray(1, 2, 3) pc.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2)) </code></pre> Also it is mentioned that <blockquote> The execution context task support is set to each parallel collection by default, so parallel collections reuse the same fork-join pool as the future API. </blockquote> It means that you should create single pool and reuse it. This approach causes resource leak: <pre class="prettyprint"><code>def calculate(collection: Seq[Int]): Seq[Int] = { val parallel = collection.par parallel.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(5)) parallel.map(_ * 2).seq } </code></pre> Right way to do this would be to reuse existing pool: <pre class="prettyprint"><code>val taskSupport = new ForkJoinTaskSupport(new ForkJoinPool(5)) def calculate(collection: Seq[Int]): Seq[Int] = { val parallel = collection.par parallel.tasksupport = taskSupport parallel.map(_ * 2).seq } </code></pre>

How to set thread number for the parallel collections?

Tags:

parallel-processing

scala

I can run scala's foreach in parallel like that:

val N = 100
(0 until N).par.foreach(i => {
   // do something
})

But how can I set thread number? I want something like that:

val N = 100
val NThreads = 5
(0 until N).par.foreach(NThreads, i => {
   // do something
})

463

asked Jun 09 '16 12:06

user1312837

2 Answers

Every parallel collection keeps a tasksupport object which keeps a reference to thread pool implementation.

So, you can set the parallelism level through that object by changing the reference of tasksupport object to a new thread pool according to your need. eg:

def f(numOfThread: Int, n: Int) = {
 import scala.collection.parallel._
 val coll = (0 to n).par
 coll.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(numOfThreads))
  coll.foreach(i => {
   // do something
  })
}

f(2, 100)

For more info on configuring parallel collections you can refer http://docs.scala-lang.org/overviews/parallel-collections/configuration.html

132

answered Oct 12 '22 09:10

curious

Official Scala documentation provides a way to change the task support of a parallel collection like this:

import scala.collection.parallel._
val pc = mutable.ParArray(1, 2, 3)
pc.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2))

Also it is mentioned that

The execution context task support is set to each parallel collection by default, so parallel collections reuse the same fork-join pool as the future API.

It means that you should create single pool and reuse it. This approach causes resource leak:

def calculate(collection: Seq[Int]): Seq[Int] = {
  val parallel = collection.par
  parallel.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(5))
  parallel.map(_ * 2).seq
}

Right way to do this would be to reuse existing pool:

val taskSupport = new ForkJoinTaskSupport(new ForkJoinPool(5))

def calculate(collection: Seq[Int]): Seq[Int] = {
  val parallel = collection.par
  parallel.tasksupport = taskSupport
  parallel.map(_ * 2).seq
}

answered Oct 12 '22 11:10

Avseiytsev Dmitriy

Related questions
                            
                                Efficacy of sticking to just the functional paradigm in Scala
                            
                                Spark, Scala, DataFrame: create feature vectors
                            
                                SBT Test Error: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream
                            
                                Scala's tuple unwrapping nuance
                            
                                What does the tilde (~) mean in this Scala example?
                            
                                scalacheck case class random data generator
                            
                                Replace null values in Spark DataFrame
                            
                                Getting the value of a DataFrame column in Spark
                            
                                Scala REPL no echo on input
                            
                                Does Scala have a library method to wrap nullable return values in an Option?
                            
                                initialise a var in scala
                            
                                Scala - convert Array[String] to Array[Double]
                            
                                Apache spark error: not found: value sqlContext
                            
                                Spark Shell "Failed to Initialize Compiler" Error on a mac
                            
                                Scala equivalent to Haskell's where-clauses?
                            
                                What very large functional language projects are freely available? [closed]
                            
                                scala: accumulate a var from collection in a functional manner (that is, no vars)
                            
                                What makes recent versions of JVM faster?
                            
                                Filtering a list of tuples
                            
                                How to randomly sample from a Scala list or array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With