Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose keys from a python dictionary based on weighted probability? [duplicate]

I have a Python dictionary where keys represent some item and values represent some (normalized) weighting for said item. For example:

d = {'a': 0.0625, 'c': 0.625, 'b': 0.3125}
# Note that sum([v for k,v in d.iteritems()]) == 1 for all `d`

Given this correlation of items to weights, how can I choose a key from d such that 6.25% of the time the result is 'a', 32.25% of the time the result is 'b', and 62.5% of the result is 'c'?

like image 469
Joseph Avatar asked Dec 02 '16 07:12

Joseph


3 Answers

def weighted_random_by_dct(dct):
    rand_val = random.random()
    total = 0
    for k, v in dct.items():
        total += v
        if rand_val <= total:
            return k
    assert False, 'unreachable'

Should do the trick. Goes through each key and keeps a running sum and if the random value (between 0 and 1) falls in the slot it returns that key

like image 176
Anthony Sottile Avatar answered Oct 15 '22 03:10

Anthony Sottile


Starting in Python 3.6, you can use the built-in random.choices() instead of having to use Numpy.

So then, if we want to sample (with replacement) 25 keys from your dictionary where the values are the weights/probabilities of being sampled, we can simply write:

import random
random.choices(list(my_dict.keys()), weights=my_dict.values(), k=25)

This outputs a list of sampled keys:

['c', 'b', 'c', 'b', 'b', 'c', 'c', 'c', 'b', 'c', 'b', 'c', 'b', 'c', 'c', 'c', 'c', 'c', 'a', 'b']

If you just want one key, set k to 1 and extract the single element from the list that random.choices returns:

random.choices(list(my_dict.keys()), weights=my_dict.values(), k=1)[0]

(If you don't convert my_dict.keys() to a list, you'll get a TypeError about how it's not subscriptable.)

Here's the relevant snippet from the docs:

random.choices(population, weights=None, *, cum_weights=None, k=1)

Return a k sized list of elements chosen from the population with replacement. If the population is empty, raises IndexError.

If a weights sequence is specified, selections are made according to the relative weights. Alternatively, if a cum_weights sequence is given, the selections are made according to the cumulative weights (perhaps computed using itertools.accumulate()). For example, the relative weights [10, 5, 30, 5] are equivalent to the cumulative weights [10, 15, 45, 50]. Internally, the relative weights are converted to cumulative weights before making selections, so supplying the cumulative weights saves work.

If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence. It is a TypeError to specify both weights and cum_weights.

The weights or cum_weights can use any numeric type that interoperates with the float values returned by random() (that includes integers, floats, and fractions but excludes decimals). Weights are assumed to be non-negative.

For a given seed, the choices() function with equal weighting typically produces a different sequence than repeated calls to choice(). The algorithm used by choices() uses floating point arithmetic for internal consistency and speed. The algorithm used by choice() defaults to integer arithmetic with repeated selections to avoid small biases from round-off error.

According to the comments at https://stackoverflow.com/a/39976962/5139284, random.choices is faster for small arrays, and numpy.random.choice is faster for big arrays. numpy.random.choice also provides an option to sample without replacement, whereas there's no built-in Python standard library function for that.

like image 12
mic Avatar answered Oct 15 '22 04:10

mic


If you're planning to do this a lot, you could use numpy to select your keys from a list with weighted probabilities using np.random.choice(). The below example will pick your keys 10,000 times with the weighted probabilities.

import numpy as np

probs = [0.0625, 0.625, 0.3125]
keys = ['a', 'c', 'b']

choice_list = np.random.choice(keys, 10000, replace=True, p=probs)
like image 8
roganjosh Avatar answered Oct 15 '22 02:10

roganjosh