Suppose that you are given three "options", A
, B
and C
.
Your algorithm must pick and return a random one. For this, it is pretty simple to just put them in an array {A,B,C}
and generate a random number (0, 1 or 2) which will be the index of the element in the array to be returned.
Now, there is a variation to this algorithm: Suppose that A
has a 40% chance of being picked, B
a 20%, and C
a 40%. If that was the case, you could have a similar approach: generate an array {A,A,B,C,C}
and have a random number (0, 1, 2, 3, 4) to pick the element to be returned.
That works. However, I feel that it is very inefficient. Imagine using this algorithm for a large amount of options. You would be creating a somewhat big array, maybe with 100 elements representing a 1% each. Now, that's still not quite big, but supposing that your algorithm is used many times per second, this could be troublesome.
I've considered making a class called Slot
, which has two properties: .value
and .size
. One slot is created for each option, where the .value
property is the value of the option, and the .size
one is the equivalent to the amount of occurrences of such option in the array. Then generate a random number from 0 to the total amount of occurrences and check on what slot did the number fall on.
I'm more concerned about the algorithm, but here is my Ruby attempt on this:
class Slot
attr_accessor :value
attr_accessor :size
def initialize(value,size)
@value = value
@size = size
end
end
def picker(options)
slots = []
totalSize = 0
options.each do |value,size|
slots << Slot.new(value,size)
totalSize += size
end
pick = rand(totalSize)
currentStack = 0
slots.each do |slot|
if (pick <= currentStack + slot.size)
return slot.value
else
currentStack += slot.size
end
end
return nil
end
50.times do
print picker({"A" => 40, "B" => 20, "C" => 40})
end
Which outputs:
CCCCACCCCAAACABAAACACACCCAABACABABACBAAACACCBACAAB
Is there a more efficient way to implement an algorithm that picks a random option, where each option has a different probability of being picked?
Relative weights to choose elements from the list with different probability. First, define the probability for each element. If you specified the probability using the relative weight, the selections are made according to the relative weights. You can set relative weights using the weight parameter.
For random selection with particular weights, the following technique can then be used: Generate a random number x between 0 and s u m ( w ) − 1 sum(w)-1 sum(w)−1. Find the smallest index that corresponds to the prefix sum greater than the randomly chosen number.
Generate random value with probability Select a blank cell which you will place the random value at, type this formula =INDEX(A$2:A$8,COUNTIF(C$2:C$8,"<="&RAND())+1), press Enter key. And press F9 key to refresh the value as you need.
The simplest way is probably to write a case statement:
def get_random()
case rand(100) + 1
when 1..50 then 'A'
when 50..75 then 'B'
when 75..100 then 'C'
end
end
The problem with that is that you cannot pass any options, so you can write a function like this if you want it to be able to take options. The one below is very much like the one you wrote, but a bit shorter:
def picker(options)
current, max = 0, options.values.inject(:+)
random_value = rand(max) + 1
options.each do |key,val|
current += val
return key if random_value <= current
end
end
# A with 25% prob, B with 75%.
50.times do
print picker({"A" => 1, "B" => 3})
end
# => BBBBBBBBBABBABABBBBBBBBABBBBABBBBBABBBBBBABBBBBBBA
# If you add upp to 100, the number represent percentage.
50.times do
print picker({"A" => 40, "T" => 30, "C" => 20, "G" => 10})
end
# => GAAAATATTGTACCTCAATCCAGATACCTTAAGACCATTAAATCTTTACT
As a first approximation to a more efficient algorithm, if you compute the cumulative distribution function (which is just one pass over the distribution function, computing a running sum), then you can find the position of the randomly chosen integer using a binary search instead of a linear search. This will help if you have a lot of options, since it reduces the search time from O(#options) to O(log #options).
There is an O(1) solution, though. Here's the basic outline.
Let's say we have N options, 1...N
, with weights ω1...ωN
, where all of the ω values are at least 0. For simplicity, we scale the weights so their mean is 1
, or in other words, their sum is N
. (We just multiply them by N/Σω
. We don't actually have to do this, but it makes the next couple of paragraphs easier to type without MathJax.)
Now, create a vector of N
elements, where each element has a two option identifiers (lo
and hi
) and a cutoff p
. The option identifiers are just integers 1...N
, and p
will be computed as a real number in the range (0, 1.0)
inclusive.
We proceed to fill in the vector as follows. For each element i
in turn:
If some ωj
is exactly 1.0
, then we set: loi = j
hii = j
pi = 1.0
And we remove ωj
from the list of weights.
Otherwise, there must be some ωj < 1.0
and some ωk > 1.0
. (That's because the average weight is 1.0, and none of them have the average value. Some some of them must have less and some of them more, because it is impossible for all elements to be greater than the average or all elements to be less than the average.) Now, we set: loi = j
hii = k
pi = ωj
ωk = ωk - (1 - ωj)
And once again, we remove ωj
from the weights.
Note that in both cases, we have removed one weight, and we have reduced the sum of the weights by 1.0. So the average weight is still 1.0.
We continue in this fashion until the entire vector is filled. (The last element will have p = 1.0
).
Given this vector, we can select a weighted random option as follows:
i
in the range 1...N
and a random floating point value r
in the range (0, 1.0]
. If r < pi
then we select option loi
; otherwise, we select option hii
.It should be clear why this works from the construction of the vector. The weights of each above-average-weight option are distributed amongst the various vector elements, while each below-average-weight option is assigned to one part of some vector element with a corresponding probability of selection.
In a real implementation, we would map the range of weights onto integer values, and make the total weights close to the maximum integer (it has to be a multiple of N
, so there will be some slosh.) We can then select a slot and select the weight inside the slot from a single random integer. In fact, we can modify the algorithm to avoid the division by forcing the number of slots to be a power of 2 by adding some 0-weighted options. Because the integer arithmetic will not work out perfectly, a bit of fiddling around will be necessary, but the end result can be made to be statistically correct, modulo the characteristics of the PRNG being used, and it will execute almost as fast as a simple unweighted selection of N
options (one shift and a couple of comparisons extra), at the cost of a vector occupying less than 6N
storage elements (counting the possibility of having to almost double the number of slots).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With