I am studing the multiprocessing module of Python. I have two cases: Ex. 1 <pre class="prettyprint"><code>def Foo(nbr_iter): for step in xrange(int(nbr_iter)) : print random.uniform(0,1) ... from multiprocessing import Pool if __name__ == "__main__": ... pool = Pool(processes=nmr_parallel_block) pool.map(Foo, nbr_trial_per_process) </code></pre> Ex 2. (using numpy) <pre class="prettyprint"><code> def Foo_np(nbr_iter): np.random.seed() print np.random.uniform(0,1,nbr_iter) </code></pre> In both cases the random number generators are seeded in their forked processes. Why do I have to do the seeding explicitly in the numpy example, but not in the Python example?

Here is a nice blog post that will explains the way <code>numpy.random</code> works. If you use <code>np.random.rand()</code> it will takes the seed created when you imported the <code>np.random</code> module. So you need to create a new seed at each thread manually (cf examples in the blog post for example). The python random module does not have this issue and automatically generates different seed for each thread.

Seeding random number generators in parallel programs

Tags:

python

random

multiprocessing

numpy

I am studing the multiprocessing module of Python. I have two cases:

Ex. 1

def Foo(nbr_iter):
    for step in xrange(int(nbr_iter)) :
        print random.uniform(0,1)
...

from multiprocessing import Pool

if __name__ == "__main__":
    ...
    pool = Pool(processes=nmr_parallel_block)
    pool.map(Foo, nbr_trial_per_process)

Ex 2. (using numpy)

 def Foo_np(nbr_iter):
     np.random.seed()
     print np.random.uniform(0,1,nbr_iter)

In both cases the random number generators are seeded in their forked processes.

Why do I have to do the seeding explicitly in the numpy example, but not in the Python example?

716

asked Apr 24 '15 17:04

overcomer

3 Answers

If no seed is provided explicitly, numpy.random will seed itself using an OS-dependent source of randomness. Usually it will use /dev/urandom on Unix-based systems (or some Windows equivalent), but if this is not available for some reason then it will seed itself from the wall clock. Since self-seeding occurs at the time when a new subprocess forks, it is possible for multiple subprocesses to inherit the same seed if they forked at the same time, leading to identical random variates being produced by different subprocesses.

Often this correlates with the number of concurrent threads you are running. For example:

import numpy as np
import random
from multiprocessing import Pool

def Foo_np(seed=None):
    # np.random.seed(seed)
    return np.random.uniform(0, 1, 5)

pool = Pool(processes=8)
print np.array(pool.map(Foo_np, xrange(20)))

# [[ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.28917586  0.40997875  0.06308188  0.71512199  0.47386047]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]]

You can see that groups of up to 8 threads simultaneously forked with the same seed, giving me identical random sequences (I've marked the first group with arrows).

Calling np.random.seed() within a subprocess forces the thread-local RNG instance to seed itself again from /dev/urandom or the wall clock, which will (probably) prevent you from seeing identical output from multiple subprocesses. Best practice is to explicitly pass a different seed (or numpy.random.RandomState instance) to each subprocess, e.g.:

def Foo_np(seed=None):
    local_state = np.random.RandomState(seed)
    print local_state.uniform(0, 1, 5)

pool.map(Foo_np, range(20))

I'm not entirely sure what underlies the differences between random and numpy.random in this respect (perhaps it has slightly different rules for selecting a source of randomness to self-seed with compared to numpy.random?). I would still recommend explicitly passing a seed or a random.Random instance to each subprocess to be on the safe side. You could also use the .jumpahead() method of random.Random which is designed for shuffling the states of Random instances in multithreaded programs.

121

answered Oct 19 '22 22:10

ali_m

numpy 1.17 just introduced [quoting] "..three strategies implemented that can be used to produce repeatable pseudo-random numbers across multiple processes (local or distributed).."

the 1st strategy is using a SeedSequence object. There are many parent / child options there, but for our case, if you want the same generated random numbers, but different at each run:

(python3, printing 3 random numbers from 4 processes)

from numpy.random import SeedSequence, default_rng
from multiprocessing import Pool

def rng_mp(rng):
    return [ rng.random() for i in range(3) ]

seed_sequence = SeedSequence()
n_proc = 4
pool = Pool(processes=n_proc)
pool.map(rng_mp, [ default_rng(seed_sequence) for i in range(n_proc) ])

# 2 different runs
[[0.2825724770857644, 0.6465318335272593, 0.4620869345284885],
 [0.2825724770857644, 0.6465318335272593, 0.4620869345284885],
 [0.2825724770857644, 0.6465318335272593, 0.4620869345284885],
 [0.2825724770857644, 0.6465318335272593, 0.4620869345284885]]

[[0.04503760429109904, 0.2137916986051025, 0.8947678672387492],
 [0.04503760429109904, 0.2137916986051025, 0.8947678672387492],
 [0.04503760429109904, 0.2137916986051025, 0.8947678672387492],
 [0.04503760429109904, 0.2137916986051025, 0.8947678672387492]]

If you want the same result for reproducing purposes, you can simply reseed numpy with the same seed (17):

import numpy as np
from multiprocessing import Pool

def rng_mp(seed):
    np.random.seed(seed)
    return [ np.random.rand() for i in range(3) ]

n_proc = 4
pool = Pool(processes=n_proc)
pool.map(rng_mp, [17] * n_proc)

# same results each run:
[[0.2946650026871097, 0.5305867556052941, 0.19152078694749486],
 [0.2946650026871097, 0.5305867556052941, 0.19152078694749486],
 [0.2946650026871097, 0.5305867556052941, 0.19152078694749486],
 [0.2946650026871097, 0.5305867556052941, 0.19152078694749486]]

answered Oct 19 '22 23:10

mork

Here is a nice blog post that will explains the way numpy.random works.

If you use np.random.rand() it will takes the seed created when you imported the np.random module. So you need to create a new seed at each thread manually (cf examples in the blog post for example).

The python random module does not have this issue and automatically generates different seed for each thread.

answered Oct 19 '22 22:10

t_sic

Related questions
                            
                                Inline SVG Served By Python Script in Google App Engine Not Appearing
                            
                                determining whether a MIME type is binary or text-based
                            
                                Adding and removing audio sources to/from GStreamer pipeline on-the-go
                            
                                What is PyObjC?
                            
                                How should I comment partial Python functions?
                            
                                SQLAlchemy equivalent to Django's annotate() method
                            
                                a good solution to set up a rdf triplestore in python?
                            
                                Is there any way to generate tornado localization CSV file like django makemessage?
                            
                                Appending tuples to lists
                            
                                Simple sqlite question
                            
                                String coverage optimization in Python
                            
                                Any push back like function in python?
                            
                                How to set up "front page" documentation on PYPI for a project?
                            
                                How to compare 2 iframes and get difference visually?
                            
                                Disable all `pylint` 'Convention' messages
                            
                                Generating predictions from inferred parameters in pymc3
                            
                                How to group the choices in a Django Select widget?
                            
                                Python Web Crawlers and "getting" html source code
                            
                                How do I get an ECDSA public key from just a Bitcoin signature? ... SEC1 4.1.6 key recovery for curves over (mod p)-fields
                            
                                Is there a way to make Colab give an Audio Notification when cell has finished running

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With