Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating weighted random numbers

Hi I'm doing some code for a genomics class and I am having difficulty on a certain part.

I have a set of mutually exclusive events event1, event2, ... eventn with probabilities p1, p2, ... pn

I want to simulate randomly sampling an event n times with the given probability.

input: probabilities = {0.3, 0.2, 0.5} events{e1,e2,e3} n=100

output: there should be ~50 results for e3, ~20 for e2 and ~30 for e1. Note that these are probably not exactly 50, 20, 30 because empirical values are different from theoretical values...

like image 618
user2812970 Avatar asked Nov 09 '13 02:11

user2812970


People also ask

How do you do weighted random selection?

For random selection with particular weights, the following technique can then be used: Generate a random number between 0 and s u m ( w ) − 1 sum(w)-1 sum(w)−1. Find the smallest index that corresponds to the prefix sum greater than the randomly chosen number.

What is weighted random sampling?

In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight.


1 Answers

Python doesn't have any weighted sampling functionality built in (NumPy/SciPy does), but for a really simple case like this, it's pretty easy:

import itertools
import random

probabilities = [0.3, 0.2, 0.5]
totals = list(itertools.accumulate(probabilities))

def sample():
    n = random.uniform(0, totals[-1])
    for i, total in enumerate(totals):
        if n <= total:
            return i

If you don't have Python 3.2+, you don't have the accumulate function; you can fake it with an inefficient one-liner if the list really is this short:

totals = [sum(probabilities[:i+1]) for i in range(len(probabilities))]

… or you can write an explicit loop, or an ugly reduce call, or copy the equivalent Python function from the docs.


Also, note that random.uniform(0, totals[-1]) is just a more complicated way of writing random.random() if you can be sure that your numbers add up to 1.0.


A quick way to test this:

>>> samples = [sample() for _ in range(100000)]
>>> samples.count(0)
29878
>>> samples.count(1)
19908
>>> samples.count(2)
50214

Those are pretty close to 30%, 20%, and 50% of 100000, respectively.

like image 185
abarnert Avatar answered Oct 13 '22 12:10

abarnert