Is it somehow possible, using Scala's parallel collections to parallelize an Iterator
without evaluating it completely beforehand?
Here I am talking about parallelizing the functional transformations on an Iterator
, namely map
and flatMap
.
I think this requires evaluating some elements of the Iterator
in advance, and then computing more, once some are consumed via next
.
All I could find would require the iterator to be converted to a Iterable
or a Stream
at best. The Stream
then gets completely evaluated when I call .par
on it.
I also welcome implementation proposals if this is not readily available. Implementations should support parallel map
and flatMap
.
The design of Scala's parallel collections library is inspired by and deeply integrated with Scala's (sequential) collections library (introduced in 2.8). It provides a parallel counterpart to a number of important data structures from Scala's (sequential) collection library, including: ParArray. ParVector. mutable.
The par function is short for parallel, and represents a convenient way of accessing Scala's Parallel Collections. These are designed from the ground up to work seamlessly with both Scala's Mutable and Immutable collection data structures.
I realize that this is an old question, but does the ParIterator
implementation in the iterata library do what you were looking for?
scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).toIterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threads
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With