Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to share numpy random state of a parent process with child processes?

I set numpy random seed at the beginning of my program. During the program execution I run a function multiple times using multiprocessing.Process. The function uses numpy random functions to draw random numbers. The problem is that Process gets a copy of the current environment. Therefore, each process is running independently and they all start with the same random seed as the parent environment.

So my question is how can I share the random state of numpy in the parent environment with the child process environment? Just note that I want to use Process for my work and need to use a separate class and do import numpy in that class separately. I tried using multiprocessing.Manager to share the random state but it seems that things do not work as expected and I always get the same results. Also, it does not matter if I move the for loop inside drawNumpySamples or leave it in main.py; I still cannot get different numbers and the random state is always the same. Here's a simplified version of my code:

# randomClass.py
import numpy as np
class myClass(self):
    def __init__(self, randomSt):
        print ('setup the object')
        np.random.set_state(randomSt)
    def drawNumpySamples(self, idx)
        np.random.uniform()

And in the main file:

    # main.py
    import numpy as np
    from multiprocessing import Process, Manager
    from randomClass import myClass

    np.random.seed(1) # set random seed
    mng = Manager()
    randomState = mng.list(np.random.get_state())
    myC = myClass(randomSt = randomState)

    for i in range(10):
        myC.drawNumpySamples() # this will always return the same results

Note: I use Python 3.5. I also posted an issue on Numpy's GitHub page. Just sending the issue link here for future reference.

like image 888
Amir Avatar asked Mar 19 '18 21:03

Amir


2 Answers

Even if you manage to get this working, I don’t think it will do what you want. As soon as you have multiple processes pulling from the same random state in parallel, it’s no longer deterministic which order they each get to the state, meaning your runs won’t actually be repeatable. There are probably ways around that, but it seems like a nontrivial problem.

Meanwhile, there is a solution that should solve both the problem you want and the nondeterminism problem:

Before spawning a child process, ask the RNG for a random number, and pass it to the child. The child can then seed with that number. Each child will then have a different random sequence from other children, but the same random sequence that the same child got if you rerun the entire app with a fixed seed.

If your main process does any other RNG work that could depend non-deterministically on the execution of the children, you'll need to pre-generate the seeds for all of your child processes, in order, before pulling any other random numbers.


As senderle pointed out in a comment: If you don't need multiple distinct runs, but just one fixed run, you don't even really need to pull a seed from your seeded RNG; just use a counter starting at 1 and increment it for each new process, and use that as a seed. I don't know if that's acceptable, but if it is, it's hard to get simpler than that.

As Amir pointed out in a comment: a better way is to draw a random integer every time you spawn a new process and pass that random integer to the new process to set the numpy's random seed with that integer. This integer can indeed come from np.random.randint().

like image 73
abarnert Avatar answered Nov 14 '22 22:11

abarnert


You need to update the state of the Manager each time you get a random number:

import numpy as np
from multiprocessing import Manager, Pool, Lock

lock = Lock()
mng = Manager()
state = mng.list(np.random.get_state())

def get_random(_):
    with lock:
        np.random.set_state(state)
        result = np.random.uniform()
        state[:] = np.random.get_state()
        return result

np.random.seed(1)
result1 = Pool(10).map(get_random, range(10))

# Compare with non-parallel version
np.random.seed(1)
result2 = [np.random.uniform() for _ in range(10)]

# result of Pool.map may be in different order
assert sorted(result1) == sorted(result2)
like image 41
Alex Hall Avatar answered Nov 14 '22 21:11

Alex Hall