Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boost Mersenne Twister: how to seed with more than one value?

I'm using the boost mt19937 implementation for a simulation.

The simulation needs to be reproducible, and that means storing and potentially reusing the RNG seeds later. I'm using the windows crypto api to generate the seed values because I need an external source for the seeds and not because of any particular guarantees of randomness. The output of any simulation run will have a note including the RNG seed - so the seed needs to be reasonably short. On the other hand, as part of the analysis of the simulation, I'll be comparing several runs - but to be sure that these runs are actually different, I'll need to use different seeds - so the seed needs to be long enough to avoid accidental collisions.

I've determined that 64-bits of seeding should suffice; the chance of a collision will reach 50% after about 2^32 runs - that probability is low enough that the average error caused by it is negligible to me. Using just 32-bits of seed is tricky; the chance of a collision reaches 50% already after 2^16 runs; and that's a little too likely for my tastes.

Unfortunately, the boost implementation either seeds with a full state vector - which is far, far too long - or a single 32-bit unsigned long - which isn't ideal.

How can I seed the generator with more than 32-bits but less than a full state vector? I tried just padding the vector or repeating the seeds to fill the state vector, but even a cursory glance at the results shows that that generates poor results.

like image 500
Eamon Nerbonne Avatar asked May 26 '10 16:05

Eamon Nerbonne


2 Answers

Your assumptions are mistaken. For a simulation, you don't need cryptographically strong seeds. In fact, using seeds 1,2,3,4, etcetera is often a better idea. The output values of the Mersenne Twister will be uncorrelated, yet nobody will question whether you cherry-picked your seeds to get desired simulation outputs.

For other people who do have a real need, one easy way is to discard the first (seed>>32) values generated. This gives you about log2(seed>>32) extra bits of state. However, it only works efficiently if you need a few extra bits. Adding 32 bits this way is probably too slow.

A faster algorithm is to generate the full state vector for the good random generator. The solutions mentioned in the question (repeating or padding) aren't so good due to the limited randomness in the resulting state vector. But if you fill the initial state vector from the output of mersenne_twister(seed1) ^ mersenne_twister(seed2), this is not an issue at all.

like image 132
MSalters Avatar answered Sep 22 '22 23:09

MSalters


Looking at boost sources of mersenne_twister template:

  void seed(UIntType value)
  {
    // New seeding algorithm from 
    // http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
    // In the previous versions, MSBs of the seed affected only MSBs of the
    // state x[].
    const UIntType mask = ~0u;
    x[0] = value & mask;
    for (i = 1; i < n; i++) {
      // See Knuth "The Art of Computer Programming" Vol. 2, 3rd ed., page 106
      x[i] = (1812433253UL * (x[i-1] ^ (x[i-1] >> (w-2))) + i) & mask;
    }
  }

For mt19937 UIntType is uint32_t, w is 32. For 64-bit seed, maybe you could use the lower 32 bits to calculate every even index of the state (x) and the higher 32 bits to calculate the odd indices of the state, using that algorithm.

(This is cargo cult suggestion though)

like image 39
sbk Avatar answered Sep 21 '22 23:09

sbk