I'm looking for an efficient Python function that randomly allocates an integer across k bins.
That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.
For example, allocate(4, 3) could produce [4, 0, 0], [0, 2, 2], [1, 2, 1], etc.
It should be randomly distributed per item, assigning each of the n items randomly to each of the k bins.
If you are looking for a uniform distribution across all possible allocations (which is different from randomly distributing each item individually):
Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)
from random import sample
def allocate(n,k):
dividers = sample(range(1, n+k), k-1)
dividers = sorted(dividers)
dividers.insert(0, 0)
dividers.append(n+k)
return [dividers[i+1]-dividers[i]-1 for i in range(k)]
print(allocate(4,3))
There are ((n+k-1) choose (k-1)) possible distributions, and this is equally likely to result in each one of them.
(This is a modification of Wave Man's solution: that one is not uniform across all possible solutions: note that the only way to get [0,0,4] is to roll (0,0), but there are two ways to get [1,2,1]; rolling (1,3) or (3,1). Choosing from n+k-1 slots and counting dividers as taking a slot corrects for this. In this solution, the random sample (1,2) corresponds to [0,0,4], and the equally likely random sample (2,5) corresponds to [1,2,1])
This should be faster than your brute-force version when n >> k:
def allocate(n, k):
result = np.zeros(k)
sum_so_far = 0
for ind in range(k-1):
draw = np.random.randint(n - sum_so_far + 1)
sum_so_far += draw
result[ind] = draw
result[k-1] = n - sum_so_far
return result
The idea is to draw a random number up to some maximum m (which starts out equal to n), and then we subtract that number from the maximum for the next draw, and so on, thus guaranteeing that we will never exceed n. This way we fill up the first k-1 entries; the final one is filled with whatever is missing to get a sum of exactly n.
Note: I am not sure whether this results in a "fair" random distribution of values or if it is somehow biased towards putting larger values into earlier indices or something like that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With