Python: Selecting numbers with associated probabilities [duplicate]

Tags:

Possible Duplicates:
Random weighted choice
Generate random numbers with a given (numerical) distribution

I have a list of list which contains a series on numbers and there associated probabilities.

prob_list = [[1, 0.5], [2, 0.25], [3, 0.05], [4, 0.01], [5, 0.09], [6, 0.1]]

for example in prob_list[0] the number 1 has a probability of 0.5 associated with it. So you would expect 1 to show up 50% of the time.

How do I add weight to the numbers when I select them?

NOTE: the amount of numbers in the list can vary from 6 - 100

EDIT

In the list I have 6 numbers with their associated probabilities. I want to select two numbers based on their probability.

No number can be selected twice. If "2" is selected it can not be selected again.

620

asked Nov 25 '10 11:11

Harpal

2 Answers

I'm going to assume the probabilities all add up to 1. If they don't, you're going to have to scale them accordingly so that they do.

First generate a uniform random variable [0, 1] using random.random(). Then pass through the list, summing the probabilities. The first time the sum exceeds the random number, return the associated number. This way, if the uniform random variable generated falls within the range (0.5, 0.75] in your example, 2 will be returned, thus giving it the required 0.25 probability of being returned.

import random
import sys
def pick_random(prob_list):
  r, s = random.random(), 0
  for num in prob_list:
    s += num[1]
    if s >= r:
      return num[0]
  print >> sys.stderr, "Error: shouldn't get here"

Here's a test showing it works:

import collections
count = collections.defaultdict(int)
for i in xrange(10000):
  count[pick_random(prob_list)] += 1
for n in count:
  print n, count[n] / 10000.0

which outputs:

EDIT: Just saw the edit in the question. If you want to select two distinct numbers, you can repeat the above until your second number chosen is distinct. But this will be terribly slow if one number has a very high (e.g. 0.99999999) probability associated with it. In this case, you could remove the first number from the list and rescale the probabilities so that they sum to 1 before selecting the second number.

144

answered Nov 14 '22 22:11

moinudin

Here's something that appears to work and meet all your specifications (and subjectively it seems pretty fast). Note that your constraint that the second number not be the same as the first throws the probabilities off for selecting it. That issue is effectively ignored by the code below and it just enforces the restriction (in other words the probability of what the second number is won't be that given for each number in the prob_list).

import random

prob_list = [[1, 0.5], [2, 0.25], [3, 0.05], [4, 0.01], [5, 0.09], [6, 0.1]]

# create a list with the running total of the probabilities
acc = 0.0
acc_list = [acc]
for t in prob_list:
    acc += t[1]
    acc_list.append(acc)

TOLERANCE = .000001
def approx_eq(v1, v2):
    return abs(v1-v2) <= TOLERANCE

def within(low, value, high):
    """ Determine if low >= value <= high (approximately) """
    return (value > low or approx_eq(low, value)) and \
           (value < high or approx_eq(high, value))

def get_selection():
    """ Find which weighted interval a random selection falls in """
    interval = -1
    rand = random.random()
    for i in range(len(acc_list)-1):
        if within(acc_list[i], rand, acc_list[i+1]):
            interval = i
            break
    if interval == -1:
        raise AssertionError('no interval for {:.6}'.format(rand))
    return interval

def get_two_different_nums():
    sel1 = get_selection()
    sel2 = sel1
    while sel2 == sel1:
        sel2 = get_selection()
    return prob_list[sel1][0], prob_list[sel2][0]

answered Nov 14 '22 23:11

martineau

Related questions
                            
                                Finding missing values in a numpy array
                            
                                Re-factoring To MVC pattern -Doubts on separation of view from controller
                            
                                python and using 'self' in methods
                            
                                How does Pythonic garbage collection with numpy array appends and deletes?
                            
                                py2exe: Reduce size of the library archive
                            
                                Python doesn't save data to sqlite db
                            
                                restrict movable area of qgraphicsitem
                            
                                Anything wrong with a really large __init__?
                            
                                Java oneliner for list cleanup
                            
                                Python SocketServer
                            
                                Metaclass to parametrize Inheritance
                            
                                subprocess replacement of popen2 with Python
                            
                                gtk minimum size
                            
                                pyodbc and mySQL
                            
                                Python: Comparing Lists
                            
                                What does ... mean in numpy code?
                            
                                Ignore last \n when using readlines with python
                            
                                CherryPy Logging: How do I configure and use the global and application level loggers?
                            
                                Call-graph profilers in Python
                            
                                How to get width of a truetype font character in 1200ths of an inch with Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Selecting numbers with associated probabilities [duplicate]

Tags:

python

random

statistics

probability

Harpal

People also ask

2 Answers

moinudin

martineau

Recent Activity

Donate For Us