Selecting without any weights (equal probabilities) is beautifully described here.
I was wondering if there is a way to convert this approach to a weighted one.
I am also interested in other approaches as well.
Update: Sampling without replacement
For random selection with particular weights, the following technique can then be used: Generate a random number between 0 and s u m ( w ) − 1 sum(w)-1 sum(w)−1. Find the smallest index that corresponds to the prefix sum greater than the randomly chosen number.
In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight.
If the sampling is with replacement, you can use this algorithm (implemented here in Python):
import random items = [(10, "low"), (100, "mid"), (890, "large")] def weighted_sample(items, n): total = float(sum(w for w, v in items)) i = 0 w, v = items[0] while n: x = total * (1 - random.random() ** (1.0 / n)) total -= x while x > w: x -= w i += 1 w, v = items[i] w -= x yield v n -= 1
This is O(n + m) where m is the number of items.
Why does this work? It is based on the following algorithm:
def n_random_numbers_decreasing(v, n): """Like reversed(sorted(v * random() for i in range(n))), but faster because we avoid sorting.""" while n: v *= random.random() ** (1.0 / n) yield v n -= 1
The function weighted_sample
is just this algorithm fused with a walk of the items
list to pick out the items selected by those random numbers.
This in turn works because the probability that n random numbers 0..v will all happen to be less than z is P = (z/v)n. Solve for z, and you get z = vP1/n. Substituting a random number for P picks the largest number with the correct distribution; and we can just repeat the process to select all the other numbers.
If the sampling is without replacement, you can put all the items into a binary heap, where each node caches the total of the weights of all items in that subheap. Building the heap is O(m). Selecting a random item from the heap, respecting the weights, is O(log m). Removing that item and updating the cached totals is also O(log m). So you can pick n items in O(m + n log m) time.
(Note: "weight" here means that every time an element is selected, the remaining possibilities are chosen with probability proportional to their weights. It does not mean that elements appear in the output with a likelihood proportional to their weights.)
Here's an implementation of that, plentifully commented:
import random class Node: # Each node in the heap has a weight, value, and total weight. # The total weight, self.tw, is self.w plus the weight of any children. __slots__ = ['w', 'v', 'tw'] def __init__(self, w, v, tw): self.w, self.v, self.tw = w, v, tw def rws_heap(items): # h is the heap. It's like a binary tree that lives in an array. # It has a Node for each pair in `items`. h[1] is the root. Each # other Node h[i] has a parent at h[i>>1]. Each node has up to 2 # children, h[i<<1] and h[(i<<1)+1]. To get this nice simple # arithmetic, we have to leave h[0] vacant. h = [None] # leave h[0] vacant for w, v in items: h.append(Node(w, v, w)) for i in range(len(h) - 1, 1, -1): # total up the tws h[i>>1].tw += h[i].tw # add h[i]'s total to its parent return h def rws_heap_pop(h): gas = h[1].tw * random.random() # start with a random amount of gas i = 1 # start driving at the root while gas >= h[i].w: # while we have enough gas to get past node i: gas -= h[i].w # drive past node i i <<= 1 # move to first child if gas >= h[i].tw: # if we have enough gas: gas -= h[i].tw # drive past first child and descendants i += 1 # move to second child w = h[i].w # out of gas! h[i] is the selected node. v = h[i].v h[i].w = 0 # make sure this node isn't chosen again while i: # fix up total weights h[i].tw -= w i >>= 1 return v def random_weighted_sample_no_replacement(items, n): heap = rws_heap(items) # just make a heap... for i in range(n): yield rws_heap_pop(heap) # and pop n items off it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With