I need to obtain a k-sized sample without replacement from a population, where each member of the population has a associated weight (W).
Numpy's random.choices will not perform this task without replacement, and random.sample won't take a weighted input.
Currently, this is what I am using:
P = np.zeros((1,Parent_number))
n=0
while n < Parent_number:
draw = random.choices(population,weights=W,k=1)
if draw not in P:
P[0,n] = draw[0]
n=n+1
P=np.asarray(sorted(P[0]))
While this works, it reqires switching back and forth from arrays, to lists and back to arrays and is, therefore, less than ideal.
I am looking for the simplest and easiest to understand solution as this code will be shared with others.
sample() function. sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement.
sampling without replacement, in which a subset of the observations are selected randomly, and once an observation is selected it cannot be selected again. sampling with replacement, in which a subset of observations are selected randomly, and an observation may be selected more than once.
You can use np.random.choice
with replace=False
as follows:
np.random.choice(vec,size,replace=False, p=P)
where vec
is your population and P
is the weight vector.
For example:
import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)
As suggested by Miriam Farber, you can just use the numpy's builtin solution:
np.random.choice(vec,size,replace=False, p=P)
What follows is close to what numpy does internally. It, of course, uses numpy arrays and numpy.random.choices():
from random import choices
def weighted_sample_without_replacement(population, weights, k=1):
weights = list(weights)
positions = range(len(population))
indices = []
while True:
needed = k - len(indices)
if not needed:
break
for i in choices(positions, weights, k=needed):
if weights[i]:
weights[i] = 0.0
indices.append(i)
return [population[i] for i in indices]
This is sometimes called an urn problem. For example, given an urn with 10 red balls, 4 white balls, and 18 green balls, choose nine balls without replacement.
To do it with numpy, generate the unique selections from the total population count with sample(). Then, bisect the cumulative weights to get the population indices.
import numpy as np
from random import sample
population = np.array(['red', 'blue', 'green'])
counts = np.array([10, 4, 18])
k = 9
cum_counts = np.add.accumulate(counts)
total = cum_counts[-1]
selections = sample(range(total), k=k)
indices = np.searchsorted(cum_counts, selections, side='right')
result = population[indices]
To do this without *numpy', the same approach can be implemented with bisect() and accumulate() from the standard library:
from random import sample
from bisect import bisect
from itertools import accumulate
population = ['red', 'blue', 'green']
weights = [10, 4, 18]
k = 9
cum_weights = list(accumulate(weights))
total = cum_weights.pop()
selections = sample(range(total), k=k)
indices = [bisect(cum_weights, s) for s in selections]
result = [population[i] for i in indices]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With