Which of list, array or seq are more efficient for parallel processing and can easily implement parallel operations such as parmap
, parfilter
, etc?
EDIT:
Thanks for the suggestions. Array.Parallel
looks like a good option. Also checked out PSeq.fs
and I have got a question about how the pmap
below work.
let pmap f xs =
seq { for x in xs -> async { return f xs } }
|> Async.Parallel
|> Async.RunSynchronously
Does a new thread get spawned for each element in the sequence? If so, is there a way of breaking the seq into chunks and creating a new task for each chunk to get evaluated in parallel?
I would also like to see if there is any similar pmap
implementation for list. I found Tomas has a ParallelList
implementation in his blog post here. But I am not sure whether converting a list to array to perform parallel evaluation does not incur too much overhead and if it can be avoided?
EDIT: Thanks for all your inputs. Tomas answered my original question.
Answering my own question in the first edit:
I tried breaking a big list into chunks then apply async to each sublist.
let pmapchunk f xs =
let chunks = chunk chunksize xs
seq { for chunk in chunks -> async { return (Seq.map f) chunk } }
|> Async.Parallel
|> Async.RunSynchronously
|> Seq.concat
The results: map
: 15s, pmap
: 7s, pmapchunk
: 10s.
There is a parallel implementation of some array operations in the F# library. In general, working with arrays is probably going to be most efficient if the individual operations take a long time.
Array.Parallel
module. It contains functions for creating array (init
), for performing calculations with elements (map
) and also choose
function that can be used to implement filtering.If you're writing a complex pipeline of operations that are fairly simple, but there is a large number of them, you'll need to use PLINQ, which parallelizes the entire pipe-line as opposed to parallelizing just individual operations (like map).
PSeq
module from F# PowerPack for an F# friendly wrapper - it defines pseq<'T>
type and the usual functions for working with them. This blog post also contains some useful information.Along with Tomas' suggestion to look at Array.Parallel
, it's worth noting that arrays (and array-backed collections) will always be the most efficient to traverse (map, iter, ...) because they're stored in contiguous memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With