Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Allocate an integer randomly across k bins

Tags:

python

random

I'm looking for an efficient Python function that randomly allocates an integer across k bins. That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.

For example, allocate(4, 3) could produce [4, 0, 0], [0, 2, 2], [1, 2, 1], etc.

It should be randomly distributed per item, assigning each of the n items randomly to each of the k bins.

like image 619
Max Ghenis Avatar asked Nov 01 '25 06:11

Max Ghenis


2 Answers

If you are looking for a uniform distribution across all possible allocations (which is different from randomly distributing each item individually):

Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)

from random import sample

def allocate(n,k):
    dividers = sample(range(1, n+k), k-1)
    dividers = sorted(dividers)
    dividers.insert(0, 0)
    dividers.append(n+k)
    return [dividers[i+1]-dividers[i]-1 for i in range(k)]
    
print(allocate(4,3))

There are ((n+k-1) choose (k-1)) possible distributions, and this is equally likely to result in each one of them.

(This is a modification of Wave Man's solution: that one is not uniform across all possible solutions: note that the only way to get [0,0,4] is to roll (0,0), but there are two ways to get [1,2,1]; rolling (1,3) or (3,1). Choosing from n+k-1 slots and counting dividers as taking a slot corrects for this. In this solution, the random sample (1,2) corresponds to [0,0,4], and the equally likely random sample (2,5) corresponds to [1,2,1])

like image 68
Richard Yannow Avatar answered Nov 03 '25 20:11

Richard Yannow


This should be faster than your brute-force version when n >> k:

def allocate(n, k):
    result = np.zeros(k)
    sum_so_far = 0
    for ind in range(k-1):
        draw = np.random.randint(n - sum_so_far + 1)
        sum_so_far += draw
        result[ind] = draw
    result[k-1] = n - sum_so_far

    return result

The idea is to draw a random number up to some maximum m (which starts out equal to n), and then we subtract that number from the maximum for the next draw, and so on, thus guaranteeing that we will never exceed n. This way we fill up the first k-1 entries; the final one is filled with whatever is missing to get a sum of exactly n.

Note: I am not sure whether this results in a "fair" random distribution of values or if it is somehow biased towards putting larger values into earlier indices or something like that.

like image 20
xdurch0 Avatar answered Nov 03 '25 20:11

xdurch0



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!