Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel data collections in F#

Which of list, array or seq are more efficient for parallel processing and can easily implement parallel operations such as parmap, parfilter, etc?

EDIT: Thanks for the suggestions. Array.Parallel looks like a good option. Also checked out PSeq.fs and I have got a question about how the pmap below work.

let pmap f xs =
   seq { for x in xs -> async { return f xs } }
   |> Async.Parallel
   |> Async.RunSynchronously

Does a new thread get spawned for each element in the sequence? If so, is there a way of breaking the seq into chunks and creating a new task for each chunk to get evaluated in parallel?

I would also like to see if there is any similar pmap implementation for list. I found Tomas has a ParallelList implementation in his blog post here. But I am not sure whether converting a list to array to perform parallel evaluation does not incur too much overhead and if it can be avoided?

EDIT: Thanks for all your inputs. Tomas answered my original question.

Answering my own question in the first edit:

I tried breaking a big list into chunks then apply async to each sublist.

let pmapchunk f xs =
    let chunks = chunk chunksize xs
    seq { for chunk in chunks -> async { return (Seq.map f) chunk } }
    |> Async.Parallel
    |> Async.RunSynchronously
    |> Seq.concat

The results: map: 15s, pmap: 7s, pmapchunk: 10s.

like image 404
vis Avatar asked Mar 13 '12 01:03

vis


2 Answers

There is a parallel implementation of some array operations in the F# library. In general, working with arrays is probably going to be most efficient if the individual operations take a long time.

  • Take a look at the Array.Parallel module. It contains functions for creating array (init), for performing calculations with elements (map) and also choose function that can be used to implement filtering.

If you're writing a complex pipeline of operations that are fairly simple, but there is a large number of them, you'll need to use PLINQ, which parallelizes the entire pipe-line as opposed to parallelizing just individual operations (like map).

  • Take a look at the PSeq module from F# PowerPack for an F# friendly wrapper - it defines pseq<'T> type and the usual functions for working with them. This blog post also contains some useful information.
like image 72
Tomas Petricek Avatar answered Sep 28 '22 07:09

Tomas Petricek


Along with Tomas' suggestion to look at Array.Parallel, it's worth noting that arrays (and array-backed collections) will always be the most efficient to traverse (map, iter, ...) because they're stored in contiguous memory.

like image 30
Daniel Avatar answered Sep 28 '22 06:09

Daniel