Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pseudo-random number generator for cluster environment

How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (e.g. 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node.

  • I could generate all PRN on root node, then send them to other nodes. But it would be far too slow.
  • I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?
  • I could use different generators on each node. But is it possible with good generators like Mersenne-Twister? How could it be done?
  • Any other though?
like image 698
Charles Brunet Avatar asked Jun 16 '11 04:06

Charles Brunet


People also ask

What are pseudo random number generators used for?

A pseudo-random number generator (PRNG) is a program written for, and used in, probability and statistics applications when large quantities of random digits are needed. Most of these programs produce endless strings of single-digit numbers, usually in base 10, known as the decimal system.

What is the best random number generator algorithm?

You can think of the seed as a snippet of randomness. However, the Mersenne Twister has since replaced linear congruential generators and is still the most popular PRNG algorithm used today.


1 Answers

You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality.

Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams.

Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences.

You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change.

In addition of the README coming along with DC, take a look at the file example/new_example2.c in the DC archive. It shows example of calls to get independent sequences given a different input identifier, which is basically what you have to identify cluster jobs.

Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles:

Practical distribution of random streams for stochastic High Performance Computing, David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010

like image 70
jopasserat Avatar answered Sep 27 '22 22:09

jopasserat