Same output in different workers in multiprocessing

Tags:

I have very simple cases where the work to be done can be broken up and distributed among workers. I tried a very simple multiprocessing example from here:

import multiprocessing
import numpy as np
import time

def do_calculation(data):
    rand=np.random.randint(10)
    print data, rand
    time.sleep(rand)
    return data * 2

if __name__ == '__main__':
    pool_size = multiprocessing.cpu_count() * 2
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = list(range(10))
    print 'Input   :', inputs

    pool_outputs = pool.map(do_calculation, inputs)
    print 'Pool    :', pool_outputs

The above program produces the following output :

Input   : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0 7
1 7
2 7
5 7
3 7
4 7
6 7
7 7
8 6
9 6
Pool    : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Why is the same random number getting printed? (I have 4 cpus in my machine). Is this the best/simplest way to go ahead?

926

asked Oct 16 '12 12:10

imsc

2 Answers

I think you'll need to re-seed the random number generator using numpy.random.seed in your do_calculation function.

My guess is that the random number generator (RNG) gets seeded when you import the module. Then, when you use multiprocessing, you fork the current process with the RNG already seeded -- Thus, all your processes are sharing the same seed value for the RNG and so they'll generate the same sequences of numbers.

e.g.:

def do_calculation(data):
    np.random.seed()
    rand=np.random.randint(10)
    print data, rand
    return data * 2

149

answered Oct 02 '22 17:10

mgilson

This blog post provides an example of a good and bad practise when using numpy.random and multi-processing. The more important is to understand when the seed of your pseudo random number generator (PRNG) is created:

import numpy as np
import pprint
from multiprocessing import Pool

pp = pprint.PrettyPrinter()

def bad_practice(index):
    return np.random.randint(0,10,size=10)

def good_practice(index):
    return np.random.RandomState().randint(0,10,size=10)

p = Pool(5)

pp.pprint("Bad practice: ")
pp.pprint(p.map(bad_practice, range(5)))
pp.pprint("Good practice: ")
pp.pprint(p.map(good_practice, range(5)))

output:

'Bad practice: '
[array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
 array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
 array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
 array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
 array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9])]
'Good practice: '
[array([8, 9, 4, 5, 1, 0, 8, 1, 5, 4]),
 array([5, 1, 3, 3, 3, 0, 0, 1, 0, 8]),
 array([1, 9, 9, 9, 2, 9, 4, 3, 2, 1]),
 array([4, 3, 6, 2, 6, 1, 2, 9, 5, 2]),
 array([6, 3, 5, 9, 7, 1, 7, 4, 8, 5])]

In the good practice the seed is created once per thread while in the bad practise the seed is created only once when you import the numpy.random module.

answered Oct 02 '22 19:10

t_sic

Related questions
                            
                                How to draw the hyperplanes for SVM One-Versus-All?
                            
                                Feature importance in a binary classification and extracting SHAP values for one of the classes only
                            
                                Pip SSLError WRONG_VERSION_NUMBER under proxy
                            
                                How to convert a string representation of a list without double quoted elements to an actual list?
                            
                                Getting % Rate using Pandas Group By and .sum()
                            
                                Use GPU on python docker image
                            
                                Python can have virtual environments, is there an equivalent for Dart/flutter?
                            
                                How to check if a URL is downloadable in requests
                            
                                Generating list of probabilites
                            
                                Rotate through list of delimiters in join()
                            
                                How to fix discord music bot that stops playing before the song is actually over?
                            
                                Pandas: add new column with count how often the highest score of a day was reached by this person
                            
                                How to compare an array against a list of arrays?
                            
                                Pandas read_excel function ignoring dtype
                            
                                how to prevent Poetry to consider .gitignore
                            
                                StartQueryExecution operation: Unable to verify/create output bucket
                            
                                FastAPI How to fix error walking file system: OSError [Errno 40] Too many levels of symbolic links: '/sys/class/vtconsole/vtcon0/subsystem?
                            
                                RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces)
                            
                                Stripe Checkout - Create Session - Apply Tax Rates on subscriptions
                            
                                What is the purpose of graph collections in TensorFlow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Same output in different workers in multiprocessing

Tags:

python

parallel-processing

multiprocessing

imsc

People also ask

2 Answers

mgilson

t_sic

Recent Activity

Donate For Us