How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (e.g. 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node.
A pseudo-random number generator (PRNG) is a program written for, and used in, probability and statistics applications when large quantities of random digits are needed. Most of these programs produce endless strings of single-digit numbers, usually in base 10, known as the decimal system.
You can think of the seed as a snippet of randomness. However, the Mersenne Twister has since replaced linear congruential generators and is still the most popular PRNG algorithm used today.
You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality.
Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams.
Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences.
You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change.
In addition of the README coming along with DC, take a look at the file example/new_example2.c
in the DC archive. It shows example of calls to get independent sequences given a different input identifier, which is basically what you have to identify cluster jobs.
Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles:
Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With