Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to select list elements with different probability [duplicate]

Tags:

python

import random pos = ["A", "B", "C"] x = random.choice["A", "B", "C"] 

This code gives me either "A", "B" or "C" with equal probability. Is there a nice way to express it when you want "A" with 30%, "B" with 40% and "C" with 30% probability?

like image 428
Christian Avatar asked Nov 06 '10 13:11

Christian


People also ask

How do you choose elements from a list with different probabilities?

Relative weights to choose elements from the list with different probability. First, define the probability for each element. If you specified the probability using the relative weight, the selections are made according to the relative weights. You can set relative weights using the weight parameter.

How do you generate random choices in Python?

Use the random. sample() function when you want to choose multiple random items from a list without repetition or duplicates. There is a difference between choice() and choices() . The choices() was added in Python 3.6 to choose n elements from the list randomly, but this function can repeat items.

How do you select an element from a list in Python?

To select elements from a Python list, we will use list. append(). We will create a list of indices to be accessed and the loop is used to iterate through this index list to access the specified element. And then we add these elements to the new list using an index.

How do you find probability in Python?

If you want to know the probability of getting 2 from the second list for drawing 3 for example, you add the probabilities of n_taken = [1, 2, 0] and n_taken = [0, 2, 1] together, which gives 0.13186813186813187 + 0.08791208791208792 = 0.21978021978021978 .


1 Answers

Weights define a probability distribution function (pdf). Random numbers from any such pdf can be generated by applying its associated inverse cumulative distribution function to uniform random numbers between 0 and 1.

See also this SO explanation, or, as explained by Wikipedia:

If Y has a U[0,1] distribution then F⁻¹(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.

import random import bisect import collections  def cdf(weights):     total = sum(weights)     result = []     cumsum = 0     for w in weights:         cumsum += w         result.append(cumsum / total)     return result  def choice(population, weights):     assert len(population) == len(weights)     cdf_vals = cdf(weights)     x = random.random()     idx = bisect.bisect(cdf_vals, x)     return population[idx]  weights=[0.3, 0.4, 0.3] population = 'ABC' counts = collections.defaultdict(int) for i in range(10000):     counts[choice(population, weights)] += 1 print(counts)  # % test.py # defaultdict(<type 'int'>, {'A': 3066, 'C': 2964, 'B': 3970}) 

The choice function above uses bisect.bisect, so selection of a weighted random variable is done in O(log n) where n is the length of weights.


Note that as of version 1.7.0, NumPy has a Cythonized np.random.choice function. For example, this generates 1000 samples from the population [0,1,2,3] with weights [0.1, 0.2, 0.3, 0.4]:

import numpy as np np.random.choice(4, 1000, p=[0.1, 0.2, 0.3, 0.4]) 

np.random.choice also has a replace parameter for sampling with or without replacement.


A theoretically better algorithm is the Alias Method. It builds a table which requires O(n) time, but after that, samples can be drawn in O(1) time. So, if you need to draw many samples, in theory the Alias Method may be faster. There is a Python implementation of the Walker Alias Method here, and a numpy version here.

like image 75
unutbu Avatar answered Oct 06 '22 09:10

unutbu