In a different StackExchange, the following algorithm (based on arithmetic coding) was presented as an efficient method for generating outcomes of a 7-sided die, when all that is given is a 6-sided die:
int rand7()
{
static double a=0, width=7; // persistent state
while ((int)(a+width) != (int)a)
{
width /= 6;
a += (rand6()-1)*width;
}
int n = (int)a;
a -= n;
a *= 7; width *= 7;
return (n+1);
}
Not being a true mathematician, I'll do the best I can to explain how this algorithm works:
At the start of each call to rand7()
, width
is the ratio 7s/6t, and a
is a non-negative value with the property that a + width
lies in the interval [0, 7) after the base case. When entering the while
loop, width
is the maximal value that can be added to a
. If floor(a + width)
is different from floor(a)
, then a random choice of { 0, width
*1/6, width
*1/3, width
*1/2, width
*2/3, width
*5/6 } is added to a
, and the exponent t
is incremented by one (reducing the value of width
by a power of 6). Note that after an iteration, the property that a + width
lies in the interval [0, 7) remains invariant. When width
becomes smaller than the difference of ceil(a) - a
, the iterations stop. The loop adds more entropy into a
so long as doing so can actually affect the outcome of the die roll, and intuitively, this is building a random real number in the range of [0, 7) using base 6. After leaving the loop, the die roll is taken to be floor(a) + 1
, and a
is reduced to its fractional part. At that point a + width
lies in the interval [0, 1). To prepare for the next call and to maintain the invariant property, both a
and width
are scaled up by a factor of 7 (for width
, this increases the exponent s
by 1).
The above explains the workings of the inductive step. Analysis of the base case is left as an exercise for the interested reader.
Of course, from an efficiency standpoint, the use of floating-point arithmetic immediately pops out as a performance drag (assuming the performance of rand6()
is already sufficient and itself cannot be improved). While maintaining this arithmetic coding algorithm, what is the best approach to removing the use of floating-point?
Following up on a comment I made, here's the fixed point version of the algorithm. It uses unsigned 4.60 (that is, there are 60 bits in the fractional part of the number), which is a few more bits than you can get out of a double
:
int rand7fixed() {
static uint64_t a = 0;
static uint64_t width = 7UL<<60;
static const uint64_t intmask = 0xfUL<<60;
while (((a+width)&intmask) != (a&intmask)) {
width /= 6;
a += (rand6()-1)*width;
}
int n = a >> 60;
a &=~intmask;
a *= 7;
width *= 7;
return n+1;
}
The above turns out to be about a third faster than the double version in the OP (see benchmark results and notes below). I suspect that the time-waster is not the floating point arithmetic, but rather the conversions from double
to int
.
As R.. points out, this does not fix the bias; it merely reduces it. A simple and reasonably fast unbiased algorithm would be to repeatedly generate two rand6()
values h
and l
until at least one of them is non-zero, and then return h*6+l % 7
:
int rand7_2() {
int hi, lo;
do {
hi = rand6() - 1;
lo = rand6() - 1;
} while (hi + lo);
return (hi * 6 + lo) % 7 + 1;
}
If you feel the need to reduce the number of calls to rand6
, you can use the fact that 612 is only slightly more than 711 to generate 11 7-die rolls at a time. It is still necessary to discard some of the sets of 12 6-rolls in order to remove bias; the frequency of discarding sets will be (612−711)/612)
, or approximately 1 in 11, so on average you'll need about 1.19 6-rolls per 7-roll. You could do better by using 25 6-rolls to generate 23 7-rolls (1.13 6-rolls per 7-roll) but that doesn't quite fit into 64-bit arithmetic, so the marginal advantage in calls to rand6
would be chewed up by doing the calculations in 128-bit.
Here's the 11/12 solution:
int rand7_12() {
static int avail = 0;
static uint32_t state = 0;
static const uint32_t discard = 7*7*7*7*7*7*7*7*7*7*7; // 7 ** 11
static const int out_per_round = 11;
static const int in_per_round = 12;
if (!avail) {
do {
state = rand6() - 1;
for (int needed = in_per_round - 1; needed; --needed)
state = state * 6 + rand6() - 1;
}
while (state >= discard);
avail = out_per_round;
}
int rv = state % 7;
state /= 7;
--avail;
return rv + 1;
}
In theory you should be able to reduce the ratio to log76
, which is about 1.086. For example, you could do that by generating 895 7-rolls from 972 6-rolls, discarding about one in 1600 of the sets for an average of 1.087 6-rolls/7-roll, but you'd need 2513-bit arithmetic to hold the state.
I tested all four functions with a not-very-precise benchmark which calls rand7 700,000,000 times and then prints a histogram of the results. The results:
User time with
Algorithm User time rand6() calls cycling rand6()
---------- --------- ------------- ---------------
double 32.6 secs 760223193 13.2 secs
fixed 29.4 secs 760223194 7.9 secs
2 for 1 40.2 secs 1440004276
12 for 11 23.7 secs 840670008
The underlying rand6() implementation above is the Gnu standard c++ library's uniform_int_distribution<>(1,6)
using mt19937_64
(a 64-bit Mersenne Twister) as a PRNG. To try to get a better handle on the amount of time spent in the standard library, I also ran the test using a simple cyclic counter as a pseudorandom number generator; the remaining 13.2 and 7.9 seconds represent (roughly) the time spent in the algorithm itself, from which we can say that the fixed-point algorithm is about 40% faster. (It's hard to get good reading on the combining algorithms because the fixed sequence makes branch prediction much easier and reduces the number of rand6 calls, but both took less than 5 seconds.)
Finally, the histograms in case anyone wants to check for bias (also includes a run with std::uniform_int_distribution(1,7)
for reference):
Algorithm 1 2 3 4 5 6 7
--------- --------- --------- --------- --------- --------- --------- ---------
reference 100007522 100002456 100015800 100005923 99972185 100008908 99987206
double 100014597 100005975 99982219 99986299 100004561 100011049 99995300
fixed 100009603 100009639 100034790 99989935 99995502 99981886 99978645
2 for 1 100004476 99997766 99999521 100001382 99992802 100003868 100000185
12 for 11 99988156 100004974 100020070 100001912 99997472 99995015 99992401
[Edit]
Need to to improve the far below method, but follows is a simple unbiased method. It is inefficient as it calls rand6()
at least twice ever time. (Assume rand6()
is unbiased)
int rand7simple(void) {
int product;
do {
int a = rand6() - 1;
int b = rand6() - 1;
product = a*6 + b; // produce unbiased distributed numbers 0 .. 35
} while (product >= 35); // Redo 1 in 36 times
// produce unbiased distributed numbers 0 .. 34
return product%7 + 1;
}
For every 6 calls to rand7()
, rand6()
should be called 7 times.
Initialize a wide integer state to minimize bias.
Need to test this later. GTG.
int rand7(void) {
static int count = -1;
static unsigned long long state = 0;
if (count < 0) {
count = 0;
for (int i=0; i<25; i++) {
state *= 6;
state += rand6();
}
int retval = state % 7;
state /= 7;
int i = (count >= 6) + 1;
if (++count > 6) count = 0;
while (i-- > 0) {
state *= 6;
state += rand6();
}
return retval;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With