In the parallel programming course from EPFL, four abstractions for data parallelism are mentioned: Iterator
, Builder
, Combiner
, and Splitter
.
I am familiar with Iterator
, but have never used the other three. I have seen other traits Builder
, Combiner
, and Splitter
under scala.collection
package. However, I have idea how to use them in real-world development, particularly how to use them in collaboration with other collections like List
, Array
, ParArray
, etc. Could anyone please give me some guidance and examples?
Thanks!
The two traits Iterator
and Builder
are not specific to parallelism, however, they provide the basis for Combiner
and Splitter
.
Iterator
can help you with iterating over sequential collections by providing the methods hasNext
and next
. A Splitter
is a special case of an Iterator
and helps to partition a collection into multiple disjoint subsets. The idea is that after the splitting, these subsets can be processed in parallel. You can obtain a Splitter
from a parallel collection by invoking .splitter
on it. The two important methods of the Splitter
trait are as follows:
remaining: Int
: returns the number of elements of the current collection, or at least an approximation of that number. This information is important, since it is used to decide whether or not it's worth it to split the collection. If your collection contains only a small amount of elements, then you want to process these elements sequentially instead of splitting the collection into even smaller subsets.split: Seq[Splitter[A]]
: the method that actually splits the current collection. It returns the disjoint subsets (represented as Splitter
s), which recursively can be splitted again if it's worth it. If the subsets are small enough, they finally can be processed (e.g. filtered or mapped).Builder
s are used internally to create new (sequential) collections. A Combiner
is a special case of a Builder
and at the same time represents the counterpart to Splitter
. While a Splitter
splits your collection before it is processed in parallel, a Combiner
puts together the results afterwards. You can obtain a Combiner
from a parallel collection (subset) by invoking .newCombiner
on it. This is done via the following method:
combine(that: Combiner[A, B]): Combiner[A, B]
: combines your current collection with another collection by "merging" both Combiner
s. The result is a new Combiner
, which either represents the final result, or gets combined again with another subset (by the way: the type parameters A
and B
represent the element type and the type or the resulting collection).The thing is that you don't need to implement or even use these methods directly if you don't define a new parallel collection. The idea is that people implementing new parallel collections only need to define splitters and combiners and get a whole bunch of other operations for free, because those operations are already implemented and make use of splitters and combiners.
Of course this is only a superficial description of how those things work. For further reading, I recommend reading Architecture of the Parallel Collections Library as well as Creating Custom Parallel Collections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With