Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Picking a random option, where each option has a different probability of being picked

Suppose that you are given three "options", A, B and C.

Your algorithm must pick and return a random one. For this, it is pretty simple to just put them in an array {A,B,C} and generate a random number (0, 1 or 2) which will be the index of the element in the array to be returned.

Now, there is a variation to this algorithm: Suppose that A has a 40% chance of being picked, B a 20%, and C a 40%. If that was the case, you could have a similar approach: generate an array {A,A,B,C,C} and have a random number (0, 1, 2, 3, 4) to pick the element to be returned.

That works. However, I feel that it is very inefficient. Imagine using this algorithm for a large amount of options. You would be creating a somewhat big array, maybe with 100 elements representing a 1% each. Now, that's still not quite big, but supposing that your algorithm is used many times per second, this could be troublesome.


I've considered making a class called Slot, which has two properties: .value and .size. One slot is created for each option, where the .value property is the value of the option, and the .size one is the equivalent to the amount of occurrences of such option in the array. Then generate a random number from 0 to the total amount of occurrences and check on what slot did the number fall on.

I'm more concerned about the algorithm, but here is my Ruby attempt on this:

class Slot
  attr_accessor :value
  attr_accessor :size
  def initialize(value,size)
    @value = value
    @size  = size
  end
end

def picker(options)
  slots = []
  totalSize = 0
  options.each do |value,size|
    slots << Slot.new(value,size)
    totalSize += size
  end
  pick = rand(totalSize)
  currentStack = 0
  slots.each do |slot|
    if (pick <= currentStack + slot.size)
      return slot.value
    else
      currentStack += slot.size
    end
  end
  return nil
end

50.times do
  print picker({"A" => 40, "B" => 20, "C" => 40})
end

Which outputs:

CCCCACCCCAAACABAAACACACCCAABACABABACBAAACACCBACAAB


Is there a more efficient way to implement an algorithm that picks a random option, where each option has a different probability of being picked?

like image 448
Voldemort Avatar asked Oct 09 '13 00:10

Voldemort


People also ask

How do you choose elements from a list with different probabilities?

Relative weights to choose elements from the list with different probability. First, define the probability for each element. If you specified the probability using the relative weight, the selections are made according to the relative weights. You can set relative weights using the weight parameter.

How do you do weighted random selection?

For random selection with particular weights, the following technique can then be used: Generate a random number x between 0 and s u m ( w ) − 1 sum(w)-1 sum(w)−1. Find the smallest index that corresponds to the prefix sum greater than the randomly chosen number.

How do you generate random probability?

Generate random value with probability Select a blank cell which you will place the random value at, type this formula =INDEX(A$2:A$8,COUNTIF(C$2:C$8,"<="&RAND())+1), press Enter key. And press F9 key to refresh the value as you need.


2 Answers

The simplest way is probably to write a case statement:

def get_random()
  case rand(100) + 1
    when  1..50   then 'A'
    when 50..75   then 'B'
    when 75..100  then 'C'
  end
end

The problem with that is that you cannot pass any options, so you can write a function like this if you want it to be able to take options. The one below is very much like the one you wrote, but a bit shorter:

def picker(options)
  current, max = 0, options.values.inject(:+)
  random_value = rand(max) + 1
  options.each do |key,val|
     current += val
     return key if random_value <= current
  end
end

# A with 25% prob, B with 75%.
50.times do
  print picker({"A" => 1, "B" => 3})
end
# => BBBBBBBBBABBABABBBBBBBBABBBBABBBBBABBBBBBABBBBBBBA

# If you add upp to 100, the number represent percentage.
50.times do
  print picker({"A" => 40, "T" => 30, "C" => 20, "G" => 10})
end
# => GAAAATATTGTACCTCAATCCAGATACCTTAAGACCATTAAATCTTTACT 
like image 134
hirolau Avatar answered Sep 24 '22 23:09

hirolau


As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options).

There is an O(1) solution, though. Here's the basic outline.

Let's say we have N options, 1...N, with weights ω1...ωN, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is 1, or in other words, their sum is N. (We just multiply them by N/Σω. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.)

Now, create a vector of N elements, where each element has a two option identifiers (lo and hi) and a cutoff p. The option identifiers are just integers 1...N, and p will be computed as a real number in the range (0, 1.0) inclusive.

We proceed to fill in the vector as follows. For each element i in turn:

  • If some ωj is exactly 1.0, then we set:
       loi = j
       hii = j
       pi = 1.0
    And we remove ωj from the list of weights.

  • Otherwise, there must be some ωj < 1.0 and some ωk > 1.0. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set:
       loi = j
       hii = k
       pi = ωj
       ωk = ωk - (1 - ωj)
    And once again, we remove ωj from the weights.

Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0.

We continue in this fashion until the entire vector is filled. (The last element will have p = 1.0).

Given this vector, we can select a weighted random option as follows:

  • Generate a random integer i in the range 1...N and a random floating point value r in the range (0, 1.0]. If r < pi then we select option loi; otherwise, we select option hii.

It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection.

In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of N, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of N options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than 6N storage elements (counting the possibility of having to almost double the number of slots).

like image 25
rici Avatar answered Sep 22 '22 23:09

rici