Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Multiprocessing Numpy Random [duplicate]

Does the scope of a numpy ndarray function differently within a function called by multiprocessing? Here is an example:

Using python's multiprocessing module I am calling a function like so:

for core in range(cores):
    #target could be f() or g()
    proc = mp.Process(target=f, args=(core))
    jobs.append(proc)
for job in jobs:
    job.start()
for job in jobs:
    job.join()

def f(core):
    x = 0
    x += random.randint(0,10)
    print x

def g(core):
    #Assume an array with 4 columns and n rows
    local = np.copy(globalshared_array[:,core])
    shuffled = np.random.permutation(local)

Calling f(core), the x variable is local to the process, ie. it prints a different, random integer as expected. These never exceed 10, indicating that x=0 in each process. Is that correct?

Calling g(core) and permuting a copy of the array returns 4 identically 'shuffled' arrays. This seems to indicate that the working copy is not local the child process. Is that correct? If so, other than using sharedmemory space, is it possible to have an ndarray be local to the child process when it needs to be filled from shared memory space?

EDIT:

Altering g(core) to add a random integer appears to have the desired effect. The array's show a different value. Something must be occurring in permutation that is randomly ordering the columns (local to each child process) the same...ideas?

def g(core):
    #Assume an array with 4 columns and n rows
    local = np.copy(globalshared_array[:,core])
    local += random.randint(0,10)

EDIT II: np.random.shuffle also exhibits the same behavior. The contents of the array are shuffling, but are shuffling to the same value on each core.

like image 369
Jzl5325 Avatar asked Jan 15 '23 10:01

Jzl5325


2 Answers

Calling g(core) and permuting a copy of the array returns 4 identically 'shuffled' arrays. This seems to indicate that the working copy is not local the child process.

What it likely indicates is that the random number generator is initialized identically in each child process, producing the same sequence. You need to seed each child's generator (perhaps throwing the child's process id into the mix).

like image 147
NPE Avatar answered Jan 22 '23 00:01

NPE


To seed a random array this post was most useful. The following g(core) function succeeded in generating a random permutation for each core.

def g(core):
    pid = mp.current_process()._identity[0]
    randst = np.random.mtrand.RandomState(pid)
    randarray = randst.randint(0,100, size=(1,100)
like image 20
Jzl5325 Avatar answered Jan 22 '23 00:01

Jzl5325