Randomly selecting an element from a weighted list

Tags:

I have a list of 100,000 objects. Every list element has a "weight" associated with it that is a positive int from 1 to N.

What is the most efficient way to select a random element from the list? I want the behavior that my distribution of randomly chosen elements is the same as the distribution of weights in the list.

For example, if I have a list L = {1,1,2,5}, I want the 4th element to be selected 5/9ths of the time, on average.

Assume inserts and deletes are common on this list, so any approach using "integral area tables" would need to be updated often - hoping there is a solution with O(1) runtime and O(1) extra memory required.

287

asked Dec 22 '10 16:12

John Shedletsky

3 Answers

You can use an augmented binary search tree to store the elements, along with the sum of the weights in each subtree. This lets you insert and delete elements and weights however you want. Both sampling and updates require O(lg n) time per operation, and space usage is O(n).

Sampling is accomplished by generating a random integer in [1, S], where S is the sum of all weights (S is stored at the root of the tree), and performing binary search using the weight-sums stored for each subtree.

answered Sep 30 '22 19:09

jonderry

I really like jonderry's solution but I'm wondering if this problem needs a structure as complex as the augmented binary search tree. What if we kept two arrays, one with the input weights, say a={1,1,2,5} and one with the cumulative weights (very similar idea to jonderry's solution) which would be b={1,2,4,9}. Now generate a random number in [1 9] (say x) and binary search for it in the cumulative sum array. The location i where b[i]<=x and b[i-1]>x is noted and a[i] is returned. So, if the random number were 3, we would get i=3, and a[3]=2 would be returned. This ensures the same complexity as the augmented tree solution with an easier implementation.

answered Sep 30 '22 21:09

kyun

A solution that runs in O(n) would be to start out with selecting the first element. Then for each following element either keep the element you have or replace it with the next one. Let w be the sum of all weights for elements considered so far. Then keep the old one with probability w/(w+x) and choose the new one with p=x/(w+x), where x is the weight of the next element.

answered Sep 30 '22 19:09

villintehaspam

Related questions
                            
                                Printing numbers of the form 2^i * 5^j in increasing order
                            
                                Java 2D weighted data interpolation
                            
                                Divvying people into rooms by last name?
                            
                                Comparing and matching product names from different stores/suppliers
                            
                                Alpha-beta prunning with transposition table, iterative deepening
                            
                                Arrange pairs of numbers so that members of adjacent pairs are equal
                            
                                Fast algorithm for finding a small picture in big picture?
                            
                                Tic Tac Toe perfect AI algorithm: deeper in "create fork" step
                            
                                Range Reduction Poor Precision For Single Precision Floating Point
                            
                                How to implement Dijkstra Algorithm in Haskell
                            
                                Efficient way to recursively calculate dominator tree?
                            
                                Are there common implementation strategies for arbitrary-precision arithmetic which are valid regardless of a specific language?
                            
                                How to find the nonidentical elements from multiple vectors?
                            
                                What is the quickest/easiest way to count active users in last one minute?
                            
                                Find First Unique Element
                            
                                Apache Spark - Dealing with Sliding Windows on Temporal RDDs
                            
                                Inverting an Order-Preserving Minimal Perfect Hash Function in Better than O(K*lg N) Running Time
                            
                                Smart Indent algorithm documentation?
                            
                                Finding the closest point from a set of points on plane
                            
                                Find min. "join" operations for sequence

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Randomly selecting an element from a weighted list

Tags:

algorithm

list

random

statistics