Improve performance of 7-sided die roll simulation from a 6-sided die implementation

Question

In a different StackExchange, the following algorithm (based on arithmetic coding) was presented as an efficient method for generating outcomes of a 7-sided die, when all that is given is a 6-sided die:

int rand7()
{
  static double a=0, width=7;  // persistent state

  while ((int)(a+width) != (int)a)
  {
    width /= 6;
    a += (rand6()-1)*width;
  }

  int n = (int)a;
  a -= n; 
  a *= 7; width *= 7;
  return (n+1);
}

Not being a true mathematician, I'll do the best I can to explain how this algorithm works:

At the start of each call to rand7(), width is the ratio 7^s/6^t, and a is a non-negative value with the property that a + width lies in the interval [0, 7) after the base case. When entering the while loop, width is the maximal value that can be added to a. If floor(a + width) is different from floor(a), then a random choice of { 0, width*1/6, width*1/3, width*1/2, width*2/3, width*5/6 } is added to a, and the exponent t is incremented by one (reducing the value of width by a power of 6). Note that after an iteration, the property that a + width lies in the interval [0, 7) remains invariant. When width becomes smaller than the difference of ceil(a) - a, the iterations stop. The loop adds more entropy into a so long as doing so can actually affect the outcome of the die roll, and intuitively, this is building a random real number in the range of [0, 7) using base 6. After leaving the loop, the die roll is taken to be floor(a) + 1, and a is reduced to its fractional part. At that point a + width lies in the interval [0, 1). To prepare for the next call and to maintain the invariant property, both a and width are scaled up by a factor of 7 (for width, this increases the exponent s by 1).

The above explains the workings of the inductive step. Analysis of the base case is left as an exercise for the interested reader.

Of course, from an efficiency standpoint, the use of floating-point arithmetic immediately pops out as a performance drag (assuming the performance of rand6() is already sufficient and itself cannot be improved). While maintaining this arithmetic coding algorithm, what is the best approach to removing the use of floating-point?

rici · Accepted Answer

Following up on a comment I made, here's the fixed point version of the algorithm. It uses unsigned 4.60 (that is, there are 60 bits in the fractional part of the number), which is a few more bits than you can get out of a double:

int rand7fixed() {
    static uint64_t a = 0;
    static uint64_t width = 7UL<<60;
    static const uint64_t intmask = 0xfUL<<60;

    while (((a+width)&intmask) != (a&intmask)) {
      width /= 6;
      a += (rand6()-1)*width;
    }

    int n = a >> 60;
    a &=~intmask;
    a *= 7;
    width *= 7;
    return n+1;
}

The above turns out to be about a third faster than the double version in the OP (see benchmark results and notes below). I suspect that the time-waster is not the floating point arithmetic, but rather the conversions from double to int.

As R.. points out, this does not fix the bias; it merely reduces it. A simple and reasonably fast unbiased algorithm would be to repeatedly generate two rand6() values h and l until at least one of them is non-zero, and then return h*6+l % 7:

int rand7_2() {
  int hi, lo;
  do {
    hi = rand6() - 1;
    lo = rand6() - 1;
  } while (hi + lo);
  return (hi * 6 + lo) % 7 + 1;
}

If you feel the need to reduce the number of calls to rand6, you can use the fact that 6¹² is only slightly more than 7¹¹ to generate 11 7-die rolls at a time. It is still necessary to discard some of the sets of 12 6-rolls in order to remove bias; the frequency of discarding sets will be (6¹²−7¹¹)/6¹²), or approximately 1 in 11, so on average you'll need about 1.19 6-rolls per 7-roll. You could do better by using 25 6-rolls to generate 23 7-rolls (1.13 6-rolls per 7-roll) but that doesn't quite fit into 64-bit arithmetic, so the marginal advantage in calls to rand6 would be chewed up by doing the calculations in 128-bit.

Here's the 11/12 solution:

int rand7_12() {
    static int avail = 0;
    static uint32_t state = 0;
    static const uint32_t discard = 7*7*7*7*7*7*7*7*7*7*7; // 7 ** 11
    static const int out_per_round = 11;
    static const int in_per_round = 12;

    if (!avail) {
      do {
        state = rand6() - 1;
        for (int needed = in_per_round - 1; needed; --needed)
          state = state * 6 + rand6() - 1;
      }
      while (state >= discard);
      avail = out_per_round;
    }
    int rv = state % 7;
    state /= 7;
    --avail;
    return rv + 1;
}

In theory you should be able to reduce the ratio to log₇6, which is about 1.086. For example, you could do that by generating 895 7-rolls from 972 6-rolls, discarding about one in 1600 of the sets for an average of 1.087 6-rolls/7-roll, but you'd need 2513-bit arithmetic to hold the state.

I tested all four functions with a not-very-precise benchmark which calls rand7 700,000,000 times and then prints a histogram of the results. The results:

                                                  User time with
Algorithm       User time      rand6() calls     cycling rand6()
----------      ---------      -------------     ---------------
double          32.6 secs          760223193           13.2 secs
fixed           29.4 secs          760223194            7.9 secs
2 for 1         40.2 secs         1440004276
12 for 11       23.7 secs          840670008

The underlying rand6() implementation above is the Gnu standard c++ library's uniform_int_distribution<>(1,6) using mt19937_64 (a 64-bit Mersenne Twister) as a PRNG. To try to get a better handle on the amount of time spent in the standard library, I also ran the test using a simple cyclic counter as a pseudorandom number generator; the remaining 13.2 and 7.9 seconds represent (roughly) the time spent in the algorithm itself, from which we can say that the fixed-point algorithm is about 40% faster. (It's hard to get good reading on the combining algorithms because the fixed sequence makes branch prediction much easier and reduces the number of rand6 calls, but both took less than 5 seconds.)

Finally, the histograms in case anyone wants to check for bias (also includes a run with std::uniform_int_distribution(1,7) for reference):

Algorithm          1          2          3          4          5          6          7
---------  ---------  ---------  ---------  ---------  ---------  ---------  ---------
reference  100007522  100002456  100015800  100005923   99972185  100008908   99987206
double     100014597  100005975   99982219   99986299  100004561  100011049   99995300
fixed      100009603  100009639  100034790   99989935   99995502   99981886   99978645
2 for 1    100004476   99997766   99999521  100001382   99992802  100003868  100000185
12 for 11   99988156  100004974  100020070  100001912   99997472   99995015   99992401

chux - Reinstate Monica · Answer

[Edit]

Need to to improve the far below method, but follows is a simple unbiased method. It is inefficient as it calls rand6() at least twice ever time. (Assume rand6() is unbiased)

int rand7simple(void) {
  int product;
  do {
    int a = rand6() - 1;
    int b = rand6() - 1;
    product = a*6 + b;  // produce unbiased distributed numbers 0 .. 35
  } while (product >= 35);  // Redo 1 in 36 times
  // produce unbiased distributed numbers 0 .. 34
  return product%7 + 1;
}

For every 6 calls to rand7(), rand6() should be called 7 times. Initialize a wide integer state to minimize bias.

Need to test this later. GTG.

int rand7(void) {
  static int count = -1;
  static unsigned long long state = 0;
  if (count < 0) {
    count = 0;
    for (int i=0; i<25; i++) {
      state *= 6;
      state += rand6();
    }
  int retval = state % 7;
  state /= 7;

  int i = (count >= 6) + 1; 
  if (++count > 6) count = 0;
  while (i-- > 0) {
    state *= 6;
    state += rand6();
  }
  return retval;
}

Improve performance of 7-sided die roll simulation from a 6-sided die implementation

Tags:

performance

c

algorithm

random

jxh

2 Answers

rici

chux - Reinstate Monica

Recent Activity

Donate For Us

Improve performance of 7-sided die roll simulation from a 6-sided die implementation

Tags:

performance

c

algorithm

random

jxh

2 Answers

rici

chux - Reinstate Monica

Related questions

Recent Activity

Donate For Us