My goal is to parallelize a computation using parMap
from the parallel package, but I'd also like to add a bit of randomness to my sampling function.
Without the randomness my calculation is simply some number crunching and so it's pure and I could use parMap
. In order to get good results, I need to take multiple samples at each step and average the results. The sampling needs to be randomized.
One solution might be to use the random package, call randoms
and then consume that list during the computation (by passing the pure lazy list to the computation I would keep it pure). Unfortunately, that's a very slow random number generator and I need a lot of random numbers so I would prefer to use either mwc-random or mersenne-random (although, I don't think mersenne-random is still maintained).
Is it safe to use something like unsafePerformIO
with mwc-random to write a function like randoms
? Something like this:
randomsMWC :: Variate a => GenST s -> [a]
randomsMWC g = unsafePerformIO $ unsafeSTToIO $ randomsMWC' g
where
randomsMWC' g = do
a <- uniform g
as <- unsafeInterleaveST $ randomsMWC' g
return (a : as)
Do I need to instead reach for a parallel number generator? Or do I need to bite the bullet and admit that my algorithm is simply not pure without using the slow random package?
Suggestions? Thanks!
If having a single threaded source of randomness isn't a problem for performance, you can get a pure wrapper around mwc-random
import Control.Monad.ST.Lazy
import Control.Monad
import System.Random.MWC
rList :: Variate a => Seed -> [a]
rList s = runST $ do
g <- strictToLazyST $ restore s
advance g
advance :: Variate a => Gen s -> ST s [a]
advance g = do
x <- strictToLazyST $ uniform g
xs <- x `seq` advance g
return (x:xs)
here rList
takes a seed, and then lazily produces an infinite stream of lazy numbers deterministically. I am not sure that strictToLazyST
is really safe, but no one seems to object to it. I have not done any benchmarking, but I suspect this is pretty fast. I assume that mwc-random
is thread safe because of the explit data flow encoded with the generator, and that it can be used in the ST
monad. Inviting someone to use the hack above. I don't think the seq
is necessary, but it makes me less suspicous of strictToLazyST
that I know I have deterministic evaluation order (and it is still lazy enough to work).
You still need randomness (that is IO
) somewhere to generate a real random seed, but this should let you keep most of the code pure, and let you store the seed to file or reuse it when necessary.
GHCi:
λ: gen <- create
λ: x <- save gen
λ: drop 1 $ take 10 $ rList x :: [Int]
[4443157817804305558,-6283633474236987377,3602072957429409483,
-5035101780705339502,-925260411398682800,423158235494603307,
-6981326876987356147,490540715145304360,-8184987036354194323]
I have a not quite release-ready package hsrandom123
on Github that might be helpful here. I have started implementing this package in order to have a suitable RNG for parallel computations available. It reimplements the Philox and Threefry RNGs from the random123
C library (there's a paper describing the ideas there, too).
There's a reason my library is unreleased, though: while the actual RNG implementation is done and seems to produce the same results as the C version, I've been undecided what Haskell interface to use, and the library is hardly documented. Feel free to contact me if you need more info or help using it, though.
My guess is that mersenne-random isn't thread safe, so using any unsafe...
and parallelization will lead you to problems with calling it from multiple threads. (See also the GHC manual Section 8.2.4.1.)
Functions that need randomness aren't pure, they need some source of randomness, which is either external (hardware - like a device sampling noise) and hence bound to IO
, or pseudorandom, which needs to keep some state during the computation. Either way they can't be pure Haskell functions.
I'd start with separating your requirements to a specific monad type class, for example something like
class Monad m => MonadParRand m where
random :: MTRandom a => m a
parSequence :: [m a] -> m [a]
which will allow you to write your main code without being bound to a specific implementation. Or if you're feeling bold you could use monad-parallel and define something like
class MonadParallel m => MonadParRand m where
random :: MTRandom a => m a
Now the tricky part is how to define both random
and parSequence
(or MonadParallel
's bindM2
) and making it fast. Since you're in control of bindM2
, you can manage how threads are spawned and what state they keep. So you can bind a buffer to each thread from which it draws randomized numbers. If the buffer is empty, it makes a synchronized calls to mersenne-random (or another IO
-based number generator), fills the buffer and proceeds.
(If you implement something like that, it'd be very nice to make a standalone library from that.)
Note that randoms
from mersenne-random already uses unsafeInterleaveIO
to produce a lazy list, but I'd say the list is meant to be consumed only from a single thread. And it has also room for improvements. It uses unsafeInterleaveIO
and it has some overhead, as noted in its comment:
There are real overheads here. Consider eagerly filling chunks and extracting elements piecewise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With