Suppose that you are given three "options", <code>A</code>, <code>B</code> and <code>C</code>. Your algorithm must pick and return a random one. For this, it is pretty simple to just put them in an array <code>{A,B,C}</code> and generate a random number (0, 1 or 2) which will be the index of the element in the array to be returned. Now, there is a variation to this algorithm: Suppose that <code>A</code> has a 40% chance of being picked, <code>B</code> a 20%, and <code>C</code> a 40%. If that was the case, you could have a similar approach: generate an array <code>{A,A,B,C,C}</code> and have a random number (0, 1, 2, 3, 4) to pick the element to be returned. That works. However, I feel that it is very inefficient. Imagine using this algorithm for a large amount of options. You would be creating a somewhat big array, maybe with 100 elements representing a 1% each. Now, that's still not quite big, but supposing that your algorithm is used many times per second, this could be troublesome. <hr> I've considered making a class called <code>Slot</code>, which has two properties: <code>.value</code> and <code>.size</code>. One slot is created for each option, where the <code>.value</code> property is the value of the option, and the <code>.size</code> one is the equivalent to the amount of occurrences of such option in the array. Then generate a random number from 0 to the total amount of occurrences and check on what slot did the number fall on. I'm more concerned about the algorithm, but here is my Ruby attempt on this: <pre class="prettyprint"><code>class Slot attr_accessor :value attr_accessor :size def initialize(value,size) @value = value @size = size end end def picker(options) slots = [] totalSize = 0 options.each do |value,size| slots << Slot.new(value,size) totalSize += size end pick = rand(totalSize) currentStack = 0 slots.each do |slot| if (pick <= currentStack + slot.size) return slot.value else currentStack += slot.size end end return nil end 50.times do print picker({"A" => 40, "B" => 20, "C" => 40}) end </code></pre> Which outputs: <blockquote> CCCCACCCCAAACABAAACACACCCAABACABABACBAAACACCBACAAB </blockquote> <hr> Is there a more efficient way to implement an algorithm that picks a random option, where each option has a different probability of being picked?

As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options). There is an O(1) solution, though. Here's the basic outline. Let's say we have N options, <code>1...N</code>, with weights <code>ω1...ωN</code>, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is <code>1</code>, or in other words, their sum is <code>N</code>. (We just multiply them by <code>N/Σω</code>. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.) Now, create a vector of <code>N</code> elements, where each element has a two option identifiers (<code>lo</code> and <code>hi</code>) and a cutoff <code>p</code>. The option identifiers are just integers <code>1...N</code>, and <code>p</code> will be computed as a real number in the range <code>(0, 1.0)</code> inclusive. We proceed to fill in the vector as follows. For each element <code>i</code> in turn: <ul> <li>If some <code>ωj</code> is exactly <code>1.0</code>, then we set: <code> loi = j</code> <code> hii = j</code> <code> pi = 1.0</code> And we remove <code>ωj</code> from the list of weights.</li> <li>Otherwise, there must be some <code>ωj < 1.0</code> and some <code>ωk > 1.0</code>. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set: <code> loi = j</code> <code> hii = k</code> <code> pi = ωj</code> <code> ωk = ωk - (1 - ωj)</code> And once again, we remove <code>ωj</code> from the weights.</li> </ul> Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0. We continue in this fashion until the entire vector is filled. (The last element will have <code>p = 1.0</code>). Given this vector, we can select a weighted random option as follows: <ul> <li>Generate a random integer <code>i</code> in the range <code>1...N</code> and a random floating point value <code>r</code> in the range <code>(0, 1.0]</code>. If <code>r < pi</code> then we select option <code>loi</code>; otherwise, we select option <code>hii</code>.</li> </ul> It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection. In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of <code>N</code>, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of <code>N</code> options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than <code>6N</code> storage elements (counting the possibility of having to almost double the number of slots).

Picking a random option, where each option has a different probability of being picked

Tags:

algorithm

random

ruby

Suppose that you are given three "options", A, B and C.

Your algorithm must pick and return a random one. For this, it is pretty simple to just put them in an array {A,B,C} and generate a random number (0, 1 or 2) which will be the index of the element in the array to be returned.

Now, there is a variation to this algorithm: Suppose that A has a 40% chance of being picked, B a 20%, and C a 40%. If that was the case, you could have a similar approach: generate an array {A,A,B,C,C} and have a random number (0, 1, 2, 3, 4) to pick the element to be returned.

That works. However, I feel that it is very inefficient. Imagine using this algorithm for a large amount of options. You would be creating a somewhat big array, maybe with 100 elements representing a 1% each. Now, that's still not quite big, but supposing that your algorithm is used many times per second, this could be troublesome.

I've considered making a class called Slot, which has two properties: .value and .size. One slot is created for each option, where the .value property is the value of the option, and the .size one is the equivalent to the amount of occurrences of such option in the array. Then generate a random number from 0 to the total amount of occurrences and check on what slot did the number fall on.

I'm more concerned about the algorithm, but here is my Ruby attempt on this:

class Slot
  attr_accessor :value
  attr_accessor :size
  def initialize(value,size)
    @value = value
    @size  = size
  end
end

def picker(options)
  slots = []
  totalSize = 0
  options.each do |value,size|
    slots << Slot.new(value,size)
    totalSize += size
  end
  pick = rand(totalSize)
  currentStack = 0
  slots.each do |slot|
    if (pick <= currentStack + slot.size)
      return slot.value
    else
      currentStack += slot.size
    end
  end
  return nil
end

50.times do
  print picker({"A" => 40, "B" => 20, "C" => 40})
end

Which outputs:

CCCCACCCCAAACABAAACACACCCAABACABABACBAAACACCBACAAB

Is there a more efficient way to implement an algorithm that picks a random option, where each option has a different probability of being picked?

448

asked Oct 09 '13 00:10

Voldemort

2 Answers

The simplest way is probably to write a case statement:

def get_random()
  case rand(100) + 1
    when  1..50   then 'A'
    when 50..75   then 'B'
    when 75..100  then 'C'
  end
end

The problem with that is that you cannot pass any options, so you can write a function like this if you want it to be able to take options. The one below is very much like the one you wrote, but a bit shorter:

def picker(options)
  current, max = 0, options.values.inject(:+)
  random_value = rand(max) + 1
  options.each do |key,val|
     current += val
     return key if random_value <= current
  end
end

# A with 25% prob, B with 75%.
50.times do
  print picker({"A" => 1, "B" => 3})
end
# => BBBBBBBBBABBABABBBBBBBBABBBBABBBBBABBBBBBABBBBBBBA

# If you add upp to 100, the number represent percentage.
50.times do
  print picker({"A" => 40, "T" => 30, "C" => 20, "G" => 10})
end
# => GAAAATATTGTACCTCAATCCAGATACCTTAAGACCATTAAATCTTTACT

134

answered Sep 24 '22 23:09

hirolau

As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options).

There is an O(1) solution, though. Here's the basic outline.

Let's say we have N options, 1...N, with weights ω₁...ω_N, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is 1, or in other words, their sum is N. (We just multiply them by N/Σω. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.)

Now, create a vector of N elements, where each element has a two option identifiers (lo and hi) and a cutoff p. The option identifiers are just integers 1...N, and p will be computed as a real number in the range (0, 1.0) inclusive.

We proceed to fill in the vector as follows. For each element i in turn:

If some ω_j is exactly 1.0, then we set:
   lo_i = j
   hi_i = j
   p_i = 1.0
And we remove ω_j from the list of weights.
Otherwise, there must be some ω_j < 1.0 and some ω_k > 1.0. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set:
 lo_i = j
 hi_i = k
 p_i = ω_j
 ω_k = ω_k - (1 - ω_j)
And once again, we remove ω_j from the weights.

Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0.

We continue in this fashion until the entire vector is filled. (The last element will have p = 1.0).

Given this vector, we can select a weighted random option as follows:

Generate a random integer i in the range 1...N and a random floating point value r in the range (0, 1.0]. If r < p_i then we select option lo_i; otherwise, we select option hi_i.

It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection.

In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of N, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of N options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than 6N storage elements (counting the possibility of having to almost double the number of slots).

answered Sep 22 '22 23:09

rici

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Picking a random option, where each option has a different probability of being picked

Tags:

algorithm

random

ruby

Voldemort

People also ask

2 Answers

hirolau

rici

Recent Activity

Donate For Us

Picking a random option, where each option has a different probability of being picked

Tags:

algorithm

random

ruby

Voldemort

People also ask

2 Answers

hirolau

rici

Related questions

Recent Activity

Donate For Us