As part of a Monte Carlo simulation, I have to roll a group of dice until certain values show up a certain amount of times. My code that does this calls upon a dice class which generates a random number between 1 and 6, and returns it. Originally the code looked like
public void roll() {
value = (int)(Math.random()*6) + 1;
}
and it wasn't very fast. By exchanging Math.random() for
ThreadLocalRandom.current().nextInt(1, 7);
It ran a section in roughly 60% of the original time, which called this around about 250 million times. As part of the full simulation it will call upon this method billions of times at the very least, so is there any faster way to do this?
The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1). If u is a uniform random number on (0,1), then x = F - 1 ( u ) generates a random number x from any continuous distribution with the specified cdf F .
X = rand( n ) returns an n -by- n matrix of uniformly distributed random numbers.
To generate random numbers from a uniform distribution you can use the runif() function. Alternatively, you can use sample() to take a random sample using with or without replacements.
Pick a random generator that is as fast and as good as you need it to be, and that isn't slowed down to a tiny fraction of its normal speed by thread safety mechanisms. Then pick a method of generating the [1..6] integer distribution that is a fast and as precise as you need it to be.
The fastest simple generator that is of sufficiently high quality to beat standard tests for PRNGs like TestU01 (instead of failing systematically, like the Mersenne Twister) is Sebastiano Vigna's xorshift64*. I'm showing it as C code but Sebastiano has it in Java as well:
uint64_t xorshift64s (int64_t &x)
{
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
Sebastiano Vigna's site has lots of useful info, links and benchmark results. Including papers, for the mathematically inclined.
At that high resolution you can simply use 1 + xorshift64s(state) % 6
and the bias will be immeasurably small. If that is not fast enough, implement the modulo division by multiplication with the inverse. If that is not fast enough - if you cannot afford two MULs per variate - then it gets tricky and you need to come back here. xorshift1024* (Java) plus some bit trickery for the variate would be an option.
Batching - generating an array full of numbers and processing that, then refilling the array and so on - can unlock some speed reserves. Needlessly wrapping things in classes achieves the opposite.
P.S.: if ThreadLocalRandom and xorshift* are not fast enough for your purposes even with batching then you might be going about things in the wrong way, or you might be doing it in the wrong language. Or both.
P.P.S.: in languages like Java (or C#, or Delphi), abstraction is not free, it has a cost. In Java you also have to reckon with things like mandatory gratuitous array bounds checking, unless you have a compiler that can eliminate those checks. Teasing high performance out of a Java program can get very involved... In C++ you get abstraction and performance for free.
Darth is correct that Xorshift* is probably the best generator to use. Use it to fill a ring buffer of bytes, then fetch the bytes one at a time to roll your dice, refilling the buffer when you've fetched enough. To get the actual die roll, avoid division and bias by using rejection sampling. The rest of the code then looks something like this (in C):
do {
if (bp >= buffer + sizeof buffer) {
// refill buffer with Xorshifts
}
v = *bp++ & 7;
} while (v > 5);
return v;
This will allow you to get on average 6 die rolls per 64-bit random value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With