I intend to write a C++11 application for Linux which does some numerical simulation (not cryptography) based on approximately one million pseudorandom 32bit numbers. To speed things up, I'd like to perform the simulation in parallel threads using all cores of a desktop CPU. I'd like to use the Mersenne Twister mt19937
provided by boost as the PRNG, and I guess that for performance reasons I should have one such PRNG per thread. Now I'm unsure about how to seed them in order to avoid generating the same subsequence of random numbers in multiple threads.
Here are the alternatives that I have thought of so far:
Seed the PRNG for every thread independently from /dev/urandom
.
I'm a bit worried about the case when the system entropy pool gets exhausted, as I don't know how the system internal PRNG operates. Could it happen that I accidentially get consecutive seeds which exactly identify consecutive states of the Mersenne Twister, due to the fact that /dev/urandom
is using a Mersenne Twister itself? Probably strongly related to my concerns for the next point.
Seed one PRNG from /dev/urandom
and the others from that first one.
Basically the same concern as well: is it good or bad to use one PRNG to seed another that uses the same algorithm? Or in other words, does reading 625 32bit integers from a mt19937
correspond directly to the internal state of the mt19937
generator at any point during this generation?
Seed others from first with non-Mersenne information.
As using the same algorithm to generate random numbers and to generate the initial seed feels somehow like it might be a bad idea, I thought about introducing some element which is not dependent on the Mersenne Twister algorithm. For example, I could XOR the thread id into each element of the initial seed vector. Does that make things any better?
Share one PRNG among threads.
This would make sure that there is only one sequence, with all the known and desirable properties of the Mersenne Twister. But the locking overhead required to control access to that generator does worry me somewhat. As I have found no evidence to the contrary, I assume that I as the library user would be responsible for preventing concurrent access to the PRNG.
Pre-generate all random numbers.
This would have one thread generate all the required 1M random numbers up front, to be used by the different threads later on. The memory requirement of 4M would be small compared to that of the overall application. What worries me most in this approach is that the generation of random numbers itself is not concurrent. This whole approach also doesn't scale too well.
Which of these approaches would you suggest, and why? Or do you have a different suggestion?
Do you know which of my concerns are justified and which are simply due to my lack of insight into how things actually work?
The function rand() is not reentrant or thread-safe, since it uses hidden state that is modified on each call.
Since the random number instance is not thread-safe when two threads call the next() method at the same time it will generate 0 as output and then the random number generates 0 and is not useful.
If you want to create random numbers in the range of integers in Java than best is to use random. nextInt() method it will return all integers with equal probability. You can also use Math. random() method to first create random number as double and than scale that number into int later.
I would go with #1, seed every prng from urandom. This ensures that the states are totally independent (as far as the seed data is independent). Typically there will be plenty of entropy available unless you have many threads. Also, depending on the algorithm used for /dev/urandom you almost certainly don't need to worry about it.
So you might use something like the following to create each prng:
#include <random> std::mt19937 get_prng() { std::random_device r; std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()}; return std::mt19937(seed); }
You should verify that your implementation of std::random_device
pulls from /dev/urandom
under your configuration. And if it does use /dev/urandom by default then usually you can say std::random_device("/dev/random")
if you want to use /dev/random instead.
You could use a PRNG with a different algebraic structure to seed the different PRNGs. E.g. some sequence of MD5 hashes.
However I would opt for #5. If it works then it is fine. If it does not you can still optimize it.
The point is creating a good PRNG is much harder than one might expect. A good PRNG for threaded applications is most probably something that is still subject to research.
If the number of CPUs is low enough you could get away with leap frogging. E.g. if you have 4 cores, initialize all with the same values, but advance core 1 PRNG by 1, #2 by and #3 by 3. Then advance always 4 steps when you need a new number.
I'd use one instance to seed the others. I'm pretty sure you can do this safely fairly easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With