Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the prefered way in using the parallel collections in Scala?

At first I assumed that every collection class would receive an additional par method which would convert the collection to a fitting parallel data structure (like map returns the best collection for the element type in Scala 2.8).

Now it seems that some collection classes support a par method (e. g. Array) but others have toParSeq, toParIterable methods (e. g. List). This is a bit weird, since Array isn't used or recommended that often.

What is the reason for that? Wouldn't it be better to just have a par available on all collection classes doing the "right thing"?

If I have data which might be processed in parallel, what types should I use? The traits in scala.collection or the type of the implementation directly?

Or should I prefer Arrays now, because they seem to be cheaper to parallelize?

like image 235
soc Avatar asked Dec 18 '10 17:12

soc


1 Answers

Lists aren't that well suited for parallel processing. The reason is that to get to the end of the list, you have to walk through every single element. Thus, you may as well just treat the list as an iterator, and thus may as well just use something more generic like toParIterable.

Any collection that has a fast index is a good candidate for parallel processing. This includes anything implementing LinearSeqOptimized, plus trees and hash tables. Array has as fast of an index as you can get, so it's a fairly natural choice. You can also use things like ArrayBuffer (which has a par method returning a ParArray).

like image 94
Rex Kerr Avatar answered Sep 29 '22 06:09

Rex Kerr