Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell framework to parallelize non-threadsafe C++ lib

I have a closed source non-threadsafe C++ shared lib that provides one function f :: ByteString -> ByteString. The run-time of this function can be something between one second and a couple of hours.

I am looking for a way to distribute the calculation to multiple cores/servers (SIMD).

In a nutshell, I'm looking for a framework that provides a function

    g :: Strategy b -> (a -> b) -> a -> b

to lift a function that can only be called sequentially into a function that behaves like any other pure function in Haskell.

For instance, I want to be able to write:

    parMap rwhnf f args -- will not work

Since f calls a C function in a non-thread-safe lib via FFI, this will not work. Hence, I could replace the function f with a function g that holds a job queue and dispatches the tasks to N separate processes. The processes could run locally or distributed:

    parMap rwhnf g args -- should works

Potential frameworks I already looked into are

  1. MPI: Client (Haskell) <-- MPI --> Broker (C++) <-- MPI --> Worker (C++) <--> Lib (C++)

  2. ZeroMQ: Client (Haskell) <-- ZeroMQ --> Broker (C++) <-- ZeroMQ --> Worker (C++) <--> Lib (C++)

  3. Cloud Haskell: Client (Haskell) <-- CloudHaskell --> Worker (Haskell) <-- FFI --> Lib (C++)

  4. Gearman

  5. Erlang: Client (Haskell) <-- Erlang --> Broker (Erlang) <-- Erlang C Node --> Worker (C++)

Each approach has advantages and disadvantages.

  1. MPI will create a lot of security issues and is a pretty heavy-weight solution.

  2. ZeroMQ is a nice solution but would require that I write the broker/load balancer etc. all by myself (especially getting the reliability right is not trivial).

  3. CloudHaskell doesn't look very mature.

  4. Gearman doesn't run on Windows and has no Haskell bindings. I know about java-gearman-service but it is much less mature than the C daemon and has some other issues (e.g. no doc, shuts down if there is no incoming flow of tasks for some time, etc.).

  5. Similar to 1 and requires the use of a third language.

Thanks!

like image 390
Chronos Avatar asked May 12 '12 20:05

Chronos


1 Answers

Since the library you are using is not thread-safe you would like a solution based on using processes as your abstraction for parallelism. The example that you would like to see using the Par monad uses the spark or task based parallelism model where many sparks can live in the same thread. Clearly this is not what you are looking for.

Fear Not!

There are only a few paradigms in Haskell that work this way and you mentioned one of them in your post, Cloud Haskell. While Cloud Haskell is not "mature" yet it could solve your problems, but it may be a little heavyweight for your need. If you really just need to take advantage of many local cores using the process level parallel abstraction then look at the Eden library:

http://www.mathematik.uni-marburg.de/~eden/

With Eden you can absolutely express what you are after. Here is a very simple example along the lines of your Par Monad based version:

f $# args

Or in the case of many arguments you might just pull out ye olde map:

map f $# args

For more information about the $# syntax and for tutorials about Eden see:

http://www.mathematik.uni-marburg.de/~eden/paper/edenCEFP.pdf

YMMV as most of the more mature parallel paradigms in Haskell assume you have a level of thread safety or that use can do the parallel work in a pure manner.

Good Luck and Happy Hacking!

like image 133
krakrjak Avatar answered Oct 21 '22 22:10

krakrjak