How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (e.g. 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node. <ul> <li>I could generate all PRN on root node, then send them to other nodes. But it would be far too slow.</li> <li>I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?</li> <li>I could use different generators on each node. But is it possible with good generators like Mersenne-Twister? How could it be done?</li> <li>Any other though?</li> </ul>

You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality. Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams. Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences. You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change. In addition of the README coming along with DC, take a look at the file <code>example/new_example2.c</code> in the DC archive. It shows example of calls to get independent sequences given a different input identifier, which is basically what you have to identify cluster jobs. Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles: Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010

Pseudo-random number generator for cluster environment

Tags:

random

parallel-processing

prng

mersenne-twister

How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (e.g. 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node.

I could generate all PRN on root node, then send them to other nodes. But it would be far too slow.
I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?
I could use different generators on each node. But is it possible with good generators like Mersenne-Twister? How could it be done?
Any other though?

698

asked Jun 16 '11 04:06

Charles Brunet

1 Answers

You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality.

Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams.

Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences.

You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change.

In addition of the README coming along with DC, take a look at the file example/new_example2.c in the DC archive. It shows example of calls to get independent sequences given a different input identifier, which is basically what you have to identify cluster jobs.

Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles:

Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010

answered Sep 27 '22 22:09

jopasserat

Related questions
                            
                                cumulative weights in random.choices
                            
                                Uniform distribution from a fractal Perlin noise function in C#
                            
                                Proper way to generate a random float given a binary random number generator?
                            
                                Random "walk" around a central location in a limited area?
                            
                                Is there some pattern on Random() method?
                            
                                Pseudo random number generator gives same first output but then behaves as expected
                            
                                How can I get a Random URL on http request for Gatling?
                            
                                Generating random values from uniform distribution with setting a seed in T-SQL
                            
                                R - random distribution with predefined min, max, mean, and sd values
                            
                                Why does random work like this in Ruby?
                            
                                Is reading /dev/urandom thread-safe?
                            
                                How long does the stream of Random().Next() take until it repeats?
                            
                                How do I use chi square distribution with C++ Boost library?
                            
                                How do I generate (and label) a random integer with python 3.2?
                            
                                How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
                            
                                Algorithm for "smooth" random numbers
                            
                                Optimizing the use of the C++11 random generator
                            
                                Generate unique random numbers in Postgresql with fixed length
                            
                                Difference between RandomState and seed in numpy
                            
                                Emacs - random color theme every hour?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With