Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python multiprocessing with different random seed for each process

I wish to run several instances of a simulation in parallel, but with each simulation having its own independent data set.

Currently I implement this as follows:

P = mp.Pool(ncpus) # Generate pool of workers for j in range(nrun): # Generate processes     sim = MDF.Simulation(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat,savetemp)     lattice = MDF.Lattice(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, kb, ks, kbs, a, p, q, massL, randinit, initvel, parangle,scaletemp,savetemp)     adatom1 = MDF.Adatom(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, ra, massa, amorse, bmorse, r0, z0, name, lattice, samplerate,savetemp)             P.apply_async(run,(j,sim,lattice,adatom1),callback=After) # run simulation and ISF analysis in each process P.close() P.join() # start processes   

where sim, adatom1 and lattice are objects passed to the function run which initiates the simulation.

However, I recently found out that each batch I run simultaneously (that is, each ncpus runs out of the total nrun of simulations runs) gives the exact same results.

Can someone here enlighten how to fix this?

like image 791
Mickey Diamant Avatar asked Feb 09 '12 10:02

Mickey Diamant


People also ask

How do I randomly generate a random seed in Python?

Python Random seed() Method The random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time. Use the seed() method to customize the start number of the random number generator.

Does random seed affect Numpy?

The only important point we need to understand is that using different seeds will cause NumPy to produce different pseudo-random numbers. The output of a numpy. random function will depend on the seed that you use.

What seed does Python random use?

In the Python random module, the . seed() method is used to create a pseudo-random number generator. Pseudo-random number generators appear to produce random numbers by performing some operation on a value. This value is the seed and it sets the first “random” value of the number sequence.

What is multiprocess synchronization?

Synchronization between processes Multiprocessing is a package which supports spawning processes using an API. This package is used for both local and remote concurrencies. Using this module, programmer can use multiple processors on a given machine. It runs on Windows and UNIX os.


2 Answers

Just thought I would add an actual answer to make it clear for others.

Quoting the answer from aix in this question:

What happens is that on Unix every worker process inherits the same state of the random number generator from the parent process. This is why they generate identical pseudo-random sequences.

Use the random.seed() method (or the scipy/numpy equivalent) to set the seed properly. See also this numpy thread.

like image 57
dgorissen Avatar answered Sep 19 '22 11:09

dgorissen


This is an unsolved problem. Try to generate a unique seed for each process. You can add below code to beginning of your function to overcome the issue.

np.random.seed((os.getpid() * int(time.time())) % 123456789) 
like image 41
alercelik Avatar answered Sep 20 '22 11:09

alercelik