Annoyingly, the following doesn't work:
from collections import Counter
import random
c = Counter([1,1,1,1,0,0])
random.choice(c) # I expect this to return 1 with probability 2/3,
# and 0 with probability 1/3.
# It actually returns 4 or 2, with probability 1/2
What is the idiomatic way to sample from a multiset in Python (any version)?
Edit yes, I do really need to use a multiset. My actual data is much bigger and just storing it in a list would not be practical.
Edit 2 I need to do this with a reasonable degree of efficiency, as my code will do this repeatedly. There will be a lot of data stored in the Counter object, and anything that involves copying all of this data into a new data structure is not going to be a viable solution.
As the source code shows, Counter is just a subclass of dict. Constructing it is O(n), because it has to iterate over the input, but operations on individual elements remain O(1).
To create a list of unique random numbers, we need to use the sample() method. Warp a range() function inside a sample() to create a list of random numbers without duplicates. The range() function generates the sequence f random numbers.
Accessing Elements in Python Counter To get the list of elements in the counter we can use the elements() method. It returns an iterator object for the values in the Counter.
From the docs:
A common task is to make a random.choice() with weighted probabilities.
If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)] >>> population = [val for val, cnt in weighted_choices for i in range(cnt)] >>> random.choice(population) 'Green'
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
>>> choices, weights = zip(*weighted_choices) >>> cumdist = list(itertools.accumulate(weights)) >>> x = random.random() * cumdist[-1] >>> choices[bisect.bisect(cumdist, x)] 'Blue'
For your application, you will probably want to use the Counter to build a list of choices and a list of cumulative probabilities, then sample with the second technique.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With