Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted random sample without replacement in python

I need to obtain a k-sized sample without replacement from a population, where each member of the population has a associated weight (W).

Numpy's random.choices will not perform this task without replacement, and random.sample won't take a weighted input.

Currently, this is what I am using:

P = np.zeros((1,Parent_number))
n=0
while n < Parent_number:
    draw = random.choices(population,weights=W,k=1)
    if draw not in P:
        P[0,n] = draw[0]
        n=n+1
P=np.asarray(sorted(P[0])) 

While this works, it reqires switching back and forth from arrays, to lists and back to arrays and is, therefore, less than ideal.

I am looking for the simplest and easiest to understand solution as this code will be shared with others.

like image 281
Austin Downey Avatar asked Apr 21 '17 18:04

Austin Downey


People also ask

How do you do random sampling without replacement in Python?

sample() function. sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement.

How do you select a random sample without replacement?

sampling without replacement, in which a subset of the observations are selected randomly, and once an observation is selected it cannot be selected again. sampling with replacement, in which a subset of observations are selected randomly, and an observation may be selected more than once.


2 Answers

You can use np.random.choice with replace=False as follows:

np.random.choice(vec,size,replace=False, p=P)

where vec is your population and P is the weight vector.

For example:

import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)
like image 89
Miriam Farber Avatar answered Oct 02 '22 15:10

Miriam Farber


Built-in solution

As suggested by Miriam Farber, you can just use the numpy's builtin solution:

np.random.choice(vec,size,replace=False, p=P)

Pure python equivalent

What follows is close to what numpy does internally. It, of course, uses numpy arrays and numpy.random.choices():

from random import choices

def weighted_sample_without_replacement(population, weights, k=1):
    weights = list(weights)
    positions = range(len(population))
    indices = []
    while True:
        needed = k - len(indices)
        if not needed:
            break
        for i in choices(positions, weights, k=needed):
            if weights[i]:
                weights[i] = 0.0
                indices.append(i)
    return [population[i] for i in indices]

Related problem: Selection when elements can be repeated

This is sometimes called an urn problem. For example, given an urn with 10 red balls, 4 white balls, and 18 green balls, choose nine balls without replacement.

To do it with numpy, generate the unique selections from the total population count with sample(). Then, bisect the cumulative weights to get the population indices.

import numpy as np
from random import sample

population = np.array(['red', 'blue', 'green'])
counts = np.array([10, 4, 18])
k = 9

cum_counts = np.add.accumulate(counts)
total = cum_counts[-1]
selections = sample(range(total), k=k)
indices = np.searchsorted(cum_counts, selections, side='right')
result = population[indices]

To do this without *numpy', the same approach can be implemented with bisect() and accumulate() from the standard library:

from random import sample
from bisect import bisect
from itertools import accumulate

population = ['red', 'blue', 'green']
weights = [10, 4, 18]
k = 9

cum_weights = list(accumulate(weights))
total = cum_weights.pop()
selections = sample(range(total), k=k)
indices = [bisect(cum_weights, s) for s in selections]
result = [population[i] for i in indices]
like image 32
Raymond Hettinger Avatar answered Oct 02 '22 17:10

Raymond Hettinger