Efficiently sampling from a multiset (Counter) in Python

Tags:

python

Annoyingly, the following doesn't work:

from collections import Counter
import random

c = Counter([1,1,1,1,0,0])
random.choice(c) # I expect this to return 1 with probability 2/3, 
                 # and 0 with probability 1/3.
                 # It actually returns 4 or 2, with probability 1/2

What is the idiomatic way to sample from a multiset in Python (any version)?

Edit yes, I do really need to use a multiset. My actual data is much bigger and just storing it in a list would not be practical.

Edit 2 I need to do this with a reasonable degree of efficiency, as my code will do this repeatedly. There will be a lot of data stored in the Counter object, and anything that involves copying all of this data into a new data structure is not going to be a viable solution.

937

asked Apr 13 '14 05:04

N. Virgo

1 Answers

From the docs:

A common task is to make a random.choice() with weighted probabilities.

If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'

For your application, you will probably want to use the Counter to build a list of choices and a list of cumulative probabilities, then sample with the second technique.

answered Sep 21 '22 02:09

user2357112 supports Monica

Related questions
                            
                                Combine multiple separate lists into a list of lists
                            
                                How to calculate the distance between two points using return methods in python?
                            
                                Python - AttributeError: 'function' object has no attribute 'deepcopy'
                            
                                Django got an unexpected keyword argument
                            
                                Get the constraint name out of an IntegrityError in SQLAlchemy+Postgres
                            
                                Django values get year from DateTimeField
                            
                                Scrolling a WebKit2.Webkit window in GTK+3
                            
                                How do I automatically link to a parameter type in ReST docstrings in Sphinx?
                            
                                Assigning empty value or string in Python
                            
                                Bad results when undistorting points using OpenCV in Python
                            
                                python select.select() on Windows
                            
                                Lambda functions unequal behaviors in Python 3 and Python 2
                            
                                Pandas - cumsum by month?
                            
                                What formats can matplotlib animations be saved as?
                            
                                daisy-chaining Python/Django custom decorators
                            
                                Pandas: how to get a particular group after groupby? [duplicate]
                            
                                PyQt4: why do we need to pass class name in call to super()
                            
                                Python takes more time to print a calculation than to perform it
                            
                                Generate HTML Table from Python Dictionary
                            
                                argparse - disable same argument occurrences

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With