How to do weighted random sample of categories in python

Tags:

Given a list of tuples where each tuple consists of a probability and an item I'd like to sample an item according to its probability. For example, give the list [ (.3, 'a'), (.4, 'b'), (.3, 'c')] I'd like to sample 'b' 40% of the time.

What's the canonical way of doing this in python?

I've looked at the random module which doesn't seem to have an appropriate function and at numpy.random which although it has a multinomial function doesn't seem to return the results in a nice form for this problem. I'm basically looking for something like mnrnd in matlab.

Many thanks.

Thanks for all the answers so quickly. To clarify, I'm not looking for explanations of how to write a sampling scheme, but rather to be pointed to an easy way to sample from a multinomial distribution given a set of objects and weights, or to be told that no such function exists in a standard library and so one should write one's own.

661

asked Jun 21 '11 21:06

John

1 Answers

This might do what you want:

numpy.array([.3,.4,.3]).cumsum().searchsorted(numpy.random.sample(5))

129

answered Sep 20 '22 18:09

sholte

Related questions
                            
                                What is correct syntax to swap column values for selected rows in a pandas data frame using just one line?
                            
                                finally and rethowing of exception in except, raise in python
                            
                                What is the equivalent of np.std() in TensorFlow?
                            
                                ValueError: Input 0 is incompatible with layer lstm_13: expected ndim=3, found ndim=4
                            
                                PyCrypto not fully installed on Windows XP
                            
                                How to build many-to-many relations using SQLAlchemy: a good example
                            
                                Saving numpy array in mongodb
                            
                                set_data and autoscale_view matplotlib
                            
                                How to clear cookies using Django
                            
                                Date object with year and month only
                            
                                How to access the first and the last elements in a dictionary?
                            
                                Animating "growing" line plot in Python/Matplotlib
                            
                                How to convert pandas single column data frame to series or numpy vector [duplicate]
                            
                                Schrödinger's variable: the __class__ cell magically appears if you're checking for its presence?
                            
                                numpy array concatenate: "ValueError: all the input arrays must have same number of dimensions"
                            
                                How to pass and parse a list of strings from command line with argparse.ArgumentParser in Python?
                            
                                Adding a new column in pandas dataframe from another dataframe with differing indices
                            
                                R's which() and which.min() Equivalent in Python
                            
                                Python: Why is comparison between lists and tuples not supported?
                            
                                Get formula from Excel cell with python xlrd

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to do weighted random sample of categories in python

Tags:

python

numpy

statistics

probability

random-sample

John

People also ask

1 Answers

sholte

Recent Activity

Donate For Us