Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which operations on Scala parallel collections are parallelized?

It seems like when I invoke map on a parallel list, the operation runs in parallel, but when I do filter on that list, the operation runs strictly sequentially. So to make filter parallel, I first do map to (A,Boolean), then filter those tuples, and map all back again. It feels not very convenient.

So I am interested - which operations on parallel collections are parallelized and which are not?

like image 746
Rogach Avatar asked Oct 05 '11 15:10

Rogach


People also ask

What is parallelized collection?

Parallelized collections are created by calling SparkContext 's parallelize method on an existing collection in your driver program (a Scala Seq ). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel.

What is par in Scala?

The par method on collection provides a very easy high level API to allow computation to run in parallel to take advantage of multi-core processing. When you call the par method on a collection, it will copy all the elements into an equivalent Scala Parallel Collection.


1 Answers

There are no parallel lists. Calling par on a List converts the List into the default parallel immutable sequence - a ParVector. This conversion proceeds sequentially. Both the filter and the map should then be parallel.

scala> import scala.collection._
import scala.collection._

scala> List(1, 2, 3).par.filter { x => println(Thread.currentThread); x > 0 }
Thread[ForkJoinPool-1-worker-5,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-0,5,main]
res0: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3)

Perhaps you've concluded that the filter is not parallel, because you've measured both the conversion time and the filter time.

Some operations not parallelized currently: sort* variants, indexOfSlice.

like image 103
axel22 Avatar answered Oct 04 '22 19:10

axel22