Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't numpy.random and multiprocessing play nice? [duplicate]

I have a random walk function, that uses numpy.random to do the random step. The function walk, by itself, works just fine. In parallel, it works as expected in most cases, however in conjunction with multiprocessing, it fails. Why does multiprocessing get it wrong?

import numpy as np

def walk(x, n=100, box=.5, delta=.2):
    "perform a random walk"
    w = np.cumsum(x + np.random.uniform(-delta,delta,n))
    w = np.where(abs(w) > box)[0]
    return w[0] if len(w) else n

N = 10

# run N trials, all starting from x=0
pwalk = np.vectorize(walk)
print pwalk(np.zeros(N))

# run again, using list comprehension instead of ufunc
print [walk(0) for i in range(N)]

# run again, using multiprocessing's map
import multiprocessing as mp
p = mp.Pool()
print p.map(walk, [0]*N)

The results, are typically something like...

[47 16 72  8 15  4 38 52 12 41]
[7, 45, 25, 13, 16, 19, 12, 30, 23, 4]
[3, 3, 3, 3, 3, 3, 3, 14, 3, 14]

The first two methods obviously show randomness, while the latter doesn't. What's going on, so that multiprocessing doesn't get it right?

If you add a sleep so it's a sleepwalk and there's significant delay, multiprocessing still gets it wrong.

However, if you replace the call to np.random.uniform with a non-array method like [(random.random()-.5) for i in range(n)], then it works as expected.

So why doesn't numpy.random and multiprocessing play nice?

like image 433
Mike McKerns Avatar asked Jun 21 '14 20:06

Mike McKerns


1 Answers

What's going on, so that multiprocessing doesn't get it right?

You need to reseed in each process to make sure the pseudo-random streams are independent of one another.

I use os.urandom to generate the seeds.

like image 200
Raymond Hettinger Avatar answered Sep 29 '22 15:09

Raymond Hettinger