Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to seed mt19937_64 for Monte Carlo simulations

I'm working on a program that runs Monte Carlo simulation; specifically, I'm using a Metropolis algorithm. The program needs to generate possibly billions of "random" numbers. I know that the Mersenne twister is very popular for Monte Carlo simulation, but I would like to make sure that I am seeding the generator in the best way possible.

Currently I'm computing a 32-bit seed using the following method:

mt19937_64 prng; //pseudo random number generator
unsigned long seed; //store seed so that every run can follow the same sequence
unsigned char seed_count; //to help keep seeds from repeating because of temporal proximity

unsigned long genSeed() {
    return (  static_cast<unsigned long>(time(NULL))      << 16 )
         | ( (static_cast<unsigned long>(clock()) & 0xFF) << 8  )
         | ( (static_cast<unsigned long>(seed_count++) & 0xFF) );
}

//...

seed = genSeed();
prng.seed(seed);

I have a feeling there are much better ways to assure non-repeating new seeds, and I'm quite sure mt19937_64 can be seeded with more then 32-bits. Does anyone have any suggestions?

like image 647
Mathhead200 Avatar asked Jun 20 '14 19:06

Mathhead200


3 Answers

Use std::random_device to generate the seed. It'll provide non-deterministic random numbers, provided your implementation supports it. Otherwise it's allowed to use some other random number engine.

std::mt19937_64 prng;
seed = std::random_device{}();
prng.seed(seed);

operator() of std::random_device returns an unsigned int, so if your platform has 32-bit ints, and you want a 64-bit seed, you'll need to call it twice.

std::mt19937_64 prng;
std::random_device device;
seed = (static_cast<uint64_t>(device()) << 32) | device();
prng.seed(seed);

Another available option is using std::seed_seq to seed the PRNG. This allows the PRNG to call seed_seq::generate, which produces a non-biased sequence over the range [0 ≤ i < 232), with an output range large enough to fill its entire state.

std::mt19937_64 prng;
std::random_device device;
std::seed_seq seq{device(), device(), device(), device()};
prng.seed(seq);

I'm calling the random_device 4 times to create a 4 element initial sequence for seed_seq. However, I'm not sure what the best practice for this is, as far as length or source of elements in the initial sequence is concerned.

like image 86
Praetorian Avatar answered Nov 08 '22 08:11

Praetorian


Let's recap (comments too), we want to generate different seeds to get independent sequences of random numbers in each of the following occurrences:

  1. The program is relaunched on the same machine later,
  2. Two threads are launched on the same machine at the same time,
  3. The program is launched on two different machines at the same time.

1 is solved using time since epoch, 2 is solved with a global atomic counter, 3 is solved with a platform dependent id (see How to obtain (almost) unique system identifier in a cross platform way?)

Now the point is what is the best way to combine them to get a uint_fast64_t (the seed type of std::mt19937_64)? I assume here that we do not know a priori the range of each parameter or that they are too big, so that we cannot just play with bit shifts getting a unique seed in a trivial way.

A std::seed_seq would be the easy way to go, however its return type uint_least32_t is not our best choice.

A good 64 bits hasher is a much better choice. The STL offers std::hash under the functional header, a possibility is to concatenate the three numbers above into a string and then passing it to the hasher. The return type is a size_t which on 64 machines is very likely to match our requirements.

Collisions are unlikely but of course possible, if you want to be sure to not build up statistics that include a sequence more than once, you can only store the seeds and discard the duplicated runs.

A std::random_device could also be used to generate the seeds (collisions may still happen, hard to say if more or less often), however since the implementation is library dependent and may go down to a pseudo random generator, it is mandatory to check the entropy of the device and avoid to a use zero-entropy device for this purpose as you will probably break the points above (especially point 3). Unfortunately you can discover the entropy only when you take the program to the specific machine and test with the installed library.

like image 6
DarioP Avatar answered Nov 08 '22 08:11

DarioP


As far as I can tell from your comments, it seems that what you are interested in is ensuring that if a process starts several of your simulations at exactly the same time, they will get different seeds.

The only significant problem I can see with your current approach is a race condition: if you are going to start multiple simulations simultaneously, it must be done from separate threads. If it is done from separate threads, you need to update seed_count in a thread-safe manner, or multiple simulations could end up with the same seed_count. You could simply make it an std::atomic<int> to solve that.

Beyond that, it just seems more complicated than it has to be. What do you gain by using two separate timers? You could do something as simple as this:

  1. at program startup, grab the current system time (using a high resolution timer) once, and store that.
  2. assign each simulation a unique ID (this could just be an integer initialized to 0, (which should be generated without any race conditions, as mentioned above) which is incremented each time a simulation starts, effectively like your seed_count.
  3. when seeding a simulation, just use the initially generated timestamp + the unique ID. If you do this, every simulation in the process is assured a unique seed.
like image 4
jalf Avatar answered Nov 08 '22 10:11

jalf