Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel iterator in Scala

Is it somehow possible, using Scala's parallel collections to parallelize an Iterator without evaluating it completely beforehand?

Here I am talking about parallelizing the functional transformations on an Iterator, namely map and flatMap. I think this requires evaluating some elements of the Iterator in advance, and then computing more, once some are consumed via next.

All I could find would require the iterator to be converted to a Iterable or a Stream at best. The Stream then gets completely evaluated when I call .par on it.

I also welcome implementation proposals if this is not readily available. Implementations should support parallel map and flatMap.

like image 992
ziggystar Avatar asked Jun 18 '13 20:06

ziggystar


People also ask

What is parallel collection in Scala?

The design of Scala's parallel collections library is inspired by and deeply integrated with Scala's (sequential) collections library (introduced in 2.8). It provides a parallel counterpart to a number of important data structures from Scala's (sequential) collection library, including: ParArray. ParVector. mutable.

What is par in Scala?

The par function is short for parallel, and represents a convenient way of accessing Scala's Parallel Collections. These are designed from the ground up to work seamlessly with both Scala's Mutable and Immutable collection data structures.


1 Answers

I realize that this is an old question, but does the ParIterator implementation in the iterata library do what you were looking for?

scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).toIterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threads
like image 160
ms-tg Avatar answered Sep 22 '22 09:09

ms-tg