Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel computations with fast randomness and purity?

My goal is to parallelize a computation using parMap from the parallel package, but I'd also like to add a bit of randomness to my sampling function.

Without the randomness my calculation is simply some number crunching and so it's pure and I could use parMap. In order to get good results, I need to take multiple samples at each step and average the results. The sampling needs to be randomized.

One solution might be to use the random package, call randoms and then consume that list during the computation (by passing the pure lazy list to the computation I would keep it pure). Unfortunately, that's a very slow random number generator and I need a lot of random numbers so I would prefer to use either mwc-random or mersenne-random (although, I don't think mersenne-random is still maintained).

Is it safe to use something like unsafePerformIO with mwc-random to write a function like randoms? Something like this:

randomsMWC :: Variate a => GenST s -> [a]
randomsMWC g = unsafePerformIO $ unsafeSTToIO $ randomsMWC' g
  where
  randomsMWC' g = do
    a  <- uniform g
    as <- unsafeInterleaveST $ randomsMWC' g
    return (a : as)

Do I need to instead reach for a parallel number generator? Or do I need to bite the bullet and admit that my algorithm is simply not pure without using the slow random package?

Suggestions? Thanks!

like image 316
Jason Dagit Avatar asked Apr 27 '13 05:04

Jason Dagit


3 Answers

If having a single threaded source of randomness isn't a problem for performance, you can get a pure wrapper around mwc-random

import Control.Monad.ST.Lazy
import Control.Monad
import System.Random.MWC

rList :: Variate a => Seed -> [a]
rList s = runST $ do
  g <- strictToLazyST $ restore s
  advance g

advance :: Variate a => Gen s -> ST s [a]
advance g = do
  x <- strictToLazyST $ uniform g
  xs <- x `seq` advance g
  return (x:xs)

here rList takes a seed, and then lazily produces an infinite stream of lazy numbers deterministically. I am not sure that strictToLazyST is really safe, but no one seems to object to it. I have not done any benchmarking, but I suspect this is pretty fast. I assume that mwc-random is thread safe because of the explit data flow encoded with the generator, and that it can be used in the ST monad. Inviting someone to use the hack above. I don't think the seq is necessary, but it makes me less suspicous of strictToLazyST that I know I have deterministic evaluation order (and it is still lazy enough to work).

You still need randomness (that is IO) somewhere to generate a real random seed, but this should let you keep most of the code pure, and let you store the seed to file or reuse it when necessary.

GHCi:

λ: gen <- create
λ: x <- save gen
λ: drop 1 $ take 10 $ rList x :: [Int]
[4443157817804305558,-6283633474236987377,3602072957429409483,
 -5035101780705339502,-925260411398682800,423158235494603307,
 -6981326876987356147,490540715145304360,-8184987036354194323]
like image 84
Philip JF Avatar answered Nov 05 '22 05:11

Philip JF


I have a not quite release-ready package hsrandom123 on Github that might be helpful here. I have started implementing this package in order to have a suitable RNG for parallel computations available. It reimplements the Philox and Threefry RNGs from the random123 C library (there's a paper describing the ideas there, too).

There's a reason my library is unreleased, though: while the actual RNG implementation is done and seems to produce the same results as the C version, I've been undecided what Haskell interface to use, and the library is hardly documented. Feel free to contact me if you need more info or help using it, though.

like image 7
kosmikus Avatar answered Nov 05 '22 05:11

kosmikus


My guess is that mersenne-random isn't thread safe, so using any unsafe... and parallelization will lead you to problems with calling it from multiple threads. (See also the GHC manual Section 8.2.4.1.)

Functions that need randomness aren't pure, they need some source of randomness, which is either external (hardware - like a device sampling noise) and hence bound to IO, or pseudorandom, which needs to keep some state during the computation. Either way they can't be pure Haskell functions.

I'd start with separating your requirements to a specific monad type class, for example something like

class Monad m => MonadParRand m where
    random      :: MTRandom a => m a
    parSequence :: [m a] -> m [a]

which will allow you to write your main code without being bound to a specific implementation. Or if you're feeling bold you could use monad-parallel and define something like

class MonadParallel m => MonadParRand m where
    random      :: MTRandom a => m a

Now the tricky part is how to define both random and parSequence (or MonadParallel's bindM2) and making it fast. Since you're in control of bindM2, you can manage how threads are spawned and what state they keep. So you can bind a buffer to each thread from which it draws randomized numbers. If the buffer is empty, it makes a synchronized calls to mersenne-random (or another IO-based number generator), fills the buffer and proceeds.

(If you implement something like that, it'd be very nice to make a standalone library from that.)


Note that randoms from mersenne-random already uses unsafeInterleaveIO to produce a lazy list, but I'd say the list is meant to be consumed only from a single thread. And it has also room for improvements. It uses unsafeInterleaveIO and it has some overhead, as noted in its comment:

There are real overheads here. Consider eagerly filling chunks and extracting elements piecewise.

like image 5
Petr Avatar answered Nov 05 '22 05:11

Petr