Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for O(1) weighted random selection with removal

I'm looking for a weighted random selection algorithm with similar characteristics to the alias method, except where items are removed after they are selected.

For example, I have a bag that produces:

  • A red marble 70% of the time.
  • A green marble 10% of the time.
  • And a blue marble 20% of the time.

I sample the bag and get a red marble. Red is now removed, so I imagine the bag would now produce:

  • A green marble 33% of the time.
  • A blue marble 66% of the time.

I believe you could precompute a tree of every possible probability table, so that sample are still O(1). Are the any more clever algorithms for constant time weighted selection with removal?

like image 547
Matt Bierner Avatar asked Nov 10 '15 07:11

Matt Bierner


1 Answers

Actually, it seems I have misread the question. I was not aware of the alias method, and the answer below is not an analogous algorithm. I'll leave my answer here because it's still informative.


I'm not aware of an O(1) algorithm, but it's easy to do in log(N)2 search and log(N) update. This can likely be improved upon with a more specific algorithm.

Put your elements in a Fenwick tree, with their probabilities as their values. Also when changing elements keep track of the total cumulative probability.

Now we can do every better than just removing elements! We can arbitrarily change the probability of items, but setting an item's probability to 0 is equivalent of removing it. It is then possible to query in log(N) the cumulative probability of the nth element. This logically extends to a log(N)2 binary search of the first element with a cumulative probability larger than p.

Now, to do weighted random selection we generate a number p between 0 and P where P is the total cumulative probability. Then we do the binary search described above to find and select the first element with cumulative probability larger than p.


I've improved on the above, because using a Fenwick tree it's easy to do a log(N) search for the first element with cumulative probality larger than or equal to p. I can strongly suggest to read this explanation of Fenwick trees.

Simply put, to find the element, do a regular binary search on the Fenwick tree as you would on any other tree, with the exception that you keep a current cumulative sum (starting at 0). Whenever you choose the right hand child in the binary search increment the current cumulative sum by the value of the current node. Then when comparing the value of the current node to the cumulative sum we're looking for add the current node's value to the sum so far before comparing.

like image 141
orlp Avatar answered Oct 03 '22 11:10

orlp