I have a random walk function, that uses numpy.random
to do the random step.
The function walk
, by itself, works just fine. In parallel, it works as expected in most cases, however in conjunction with multiprocessing
, it fails.
Why does multiprocessing
get it wrong?
import numpy as np
def walk(x, n=100, box=.5, delta=.2):
"perform a random walk"
w = np.cumsum(x + np.random.uniform(-delta,delta,n))
w = np.where(abs(w) > box)[0]
return w[0] if len(w) else n
N = 10
# run N trials, all starting from x=0
pwalk = np.vectorize(walk)
print pwalk(np.zeros(N))
# run again, using list comprehension instead of ufunc
print [walk(0) for i in range(N)]
# run again, using multiprocessing's map
import multiprocessing as mp
p = mp.Pool()
print p.map(walk, [0]*N)
The results, are typically something like...
[47 16 72 8 15 4 38 52 12 41]
[7, 45, 25, 13, 16, 19, 12, 30, 23, 4]
[3, 3, 3, 3, 3, 3, 3, 14, 3, 14]
The first two methods obviously show randomness, while the latter doesn't.
What's going on, so that multiprocessing
doesn't get it right?
If you add a sleep
so it's a sleepwalk
and there's significant delay, multiprocessing
still gets it wrong.
However, if you replace the call to np.random.uniform
with a non-array method like [(random.random()-.5) for i in range(n)]
, then it works as expected.
So why doesn't numpy.random
and multiprocessing
play nice?
What's going on, so that multiprocessing doesn't get it right?
You need to reseed in each process to make sure the pseudo-random streams are independent of one another.
I use os.urandom to generate the seeds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With