Is it somehow possible, using Scala's parallel collections to parallelize an Iterator without evaluating it completely beforehand?
Here I am talking about parallelizing the functional transformations on an Iterator, namely map and flatMap.
I think this requires evaluating some elements of the Iterator in advance, and then computing more, once some are consumed via next.
All I could find would require the iterator to be converted to a Iterable or a Stream at best. The Stream then gets completely evaluated when I call .par on it.
I also welcome implementation proposals if this is not readily available. Implementations should support parallel map and flatMap.
The design of Scala's parallel collections library is inspired by and deeply integrated with Scala's (sequential) collections library (introduced in 2.8). It provides a parallel counterpart to a number of important data structures from Scala's (sequential) collection library, including: ParArray. ParVector. mutable.
The par function is short for parallel, and represents a convenient way of accessing Scala's Parallel Collections. These are designed from the ground up to work seamlessly with both Scala's Mutable and Immutable collection data structures.
I realize that this is an old question, but does the ParIterator implementation in the iterata library do what you were looking for?
scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).toIterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threads
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With