Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use random bits to simulate a fair 26-sided die?

How do I use a random number generator that gives bits (0 or 1) to simulate a fair 26-sided die? I want to use a bitstream to pick letters of the English alphabet such that the odds of any one letter coming up is the same as the odds of any other letter (I know real words aren't like that and have specific frequency distributions for each letter but it doesn't matter here). What's the best way to use binary 0/1 decisions to pick letters fairly from the set A-Z? I can think of a few ways to map bits onto letters but it's not obvious to me that they won't be biased. Is there a known good way?

like image 548
Michael Levin Avatar asked May 22 '10 23:05

Michael Levin


2 Answers

If you restrict yourself to a finite number of bits and your die has 26 sides the method will always be biased. You have to allow the possibility that you will have to look at a potentially unlimited number of bits to be sure that it is unbiased.

A simple algorithm is to choose a random number between 0 and the next largest number of the form 2^n - 1 (31 in this case). If the number you randomly pick is too large, discard it and repick until you get a number in range.

Clearly this is not an optimal algorithm as you "waste" some information, but it should be good enough for most purposes. It is most wasteful if the number of sides of the die is just above 2^m for some m, for example: 33 sides. In this case you will have to discard the value almost 50% of the time.

like image 184
Mark Byers Avatar answered Jan 04 '23 10:01

Mark Byers


The basic answer here seems right - if your random number 0..32 is greater than 25, reroll. However, you can stack the odds against an arbitrarily-long result by looking for a multiple of 26 which provides a smaller chance of going long.

 32 -  26 =  6
 64 -  52 =  12
128 -  78 =  50

... and so on. I threw together a Python script to figure out the best available number of bits up to 32, for giggles, and got this result:

2^13 - 26 * 315 = 2
2^14 - 26 * 630 = 4

So either way, you have a 1 in 2^12 chance of rerolling if you use 13 or 14 bits. Your algorithm in this case would be:

def random_character():
    r = 8190
    while r >= 8190:
        r = rand(13) # assuming rand generates an N bit integer
    return chr(r % 26 + ord('a'))

EDIT: Out of curiosity, I compared those odds with a few important values, to see if 13 was really the optimal number (assuming you can generate any number of bits, 1 to 32, in the same amount of time - if you can't, 13 bits looks like the best). Based on my (admittedly sleepy) math, if you can get 32 bits as cheaply as 16, go for that instead. Otherwise, favor 13.

2^8 through 2^12: by definition, no better than 1/2^12 odds
2^16: diff is 16, so 1/2^11
2^17: diff is 6, so slightly under 1/2^14
2^18: diff is 12, so slightly under 1/2^12
2^19: diff is 24, so slightly under 1/2^14
2^20: diff is 22, so slightly under 1/2^15
2^21: diff is 18, so slightly under 1/2^16
2^22: diff is 10, so slightly under 1/2^18
2^23: diff is 20, so slightly under 1/2^18
2^24: diff is 14, so slightly under 1/2^20
2^25: diff is 2, so 1/2^24
2^26: diff is 4, so 1/2^24
2^27: diff is 8, so 1/2^24
2^28: diff is 16, so 1/2^24
2^29: diff is 6, so slightly under 1/2^26
2^30: diff is 12, so slightly under 1/2^26
2^31: diff is 24, so slightly under 1/2^26
2^32: diff is 22, so slightly under 1/2^27
like image 27
ojrac Avatar answered Jan 04 '23 11:01

ojrac