Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a random (equal probability) combination with replacement

I want to generate one random combination out of all possible combinations_with_replacement. The tricky bit is that I want each of the possible outcomes to have the same probability without needing to generate (not even implicit) all the possible outcomes.

For example:

import itertools
import random

random.choice(list(itertools.combinations_with_replacement(range(4), 2)))

That approach is way too slow (and memory expensive) because it needs to create all possible combinations whereas I only want one.

It's not so bad if I determine how many combinations_with_replacement there will be and use random.randrange together with next and itertools.islice on the itertools.combinations_with_replacement. That doesn't need to generate all possible combinations (except in the worst-case). But it's still too slow.

On the other hand the recipe mentioned in the itertools documentation is fast but not each combination has the same probability.

like image 276
MSeifert Avatar asked Mar 09 '23 00:03

MSeifert


1 Answers

Well, I'm in a bit of a dilemma, because I've found an algorithm that works, but I don't know why. So do what you want of if, maybe some mathematician in the room can work out the probabilities, but it does empirically work. The idea is to pick one element at a time, increasing the probability of the selected elements. I suspect the reasoning must be similar to that of reservoir sampling, but I didn't work it out.

from random import choice
from itertools import combinations_with_replacement

population = ["A", "B", "C", "D"]
k = 3

def random_comb(population, k):
    idx = []
    indices = list(range(len(population)))
    for _ in range(k):
        idx.append(choice(indices))
        indices.append(idx[-1])
    return tuple(population[i] for i in sorted(idx))

combs = list(combinations_with_replacement(population, k))
counts = {c: 0 for c in combs}

for _ in range(100000):
    counts[random_comb(population, k)] += 1

for comb, count in sorted(counts.items()):
    print("".join(comb), count)

The output is the number of times each possibility has appeared after 100,000 runs:

AAA 4913
AAB 4917
AAC 5132
AAD 4966
ABB 5027
ABC 4956
ABD 4959
ACC 5022
ACD 5088
ADD 4985
BBB 5060
BBC 5070
BBD 5056
BCC 4897
BCD 5049
BDD 5059
CCC 5024
CCD 5032
CDD 4859
DDD 4929
like image 93
jdehesa Avatar answered Apr 25 '23 17:04

jdehesa