Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted random selection from array

People also ask

What is weighted random sampling?

In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight.

What is weighted choice?

Weighted random choices mean selecting random elements from a list or an array by the probability of that element. We can assign a probability to each element and according to that element(s) will be selected. By this, we can select one or more than one element from the list, And it can be achieved in two ways.

How do you create a weighted random number?

To generated a random number, weighted with a given probability, you can use a helper table together with a formula based on the RAND and MATCH functions. Notice, we are intentionally shifting the cumulative probability down one row, so that the value in D5 is zero.


Compute the discrete cumulative density function (CDF) of your list -- or in simple terms the array of cumulative sums of the weights. Then generate a random number in the range between 0 and the sum of all weights (might be 1 in your case), do a binary search to find this random number in your discrete CDF array and get the value corresponding to this entry -- this is your weighted random number.


The algorithm is straight forward

rand_no = rand(0,1)
for each element in array 
     if(rand_num < element.probablity)
          select and break
     rand_num = rand_num - element.probability

I have found this article to be the most useful at understanding this problem fully. This stackoverflow question may also be what you're looking for.


I believe the optimal solution is to use the Alias Method (wikipedia). It requires O(n) time to initialize, O(1) time to make a selection, and O(n) memory.

Here is the algorithm for generating the result of rolling a weighted n-sided die (from here it is trivial to select an element from a length-n array) as take from this article. The author assumes you have functions for rolling a fair die (floor(random() * n)) and flipping a biased coin (random() < p).

Algorithm: Vose's Alias Method

Initialization:

  1. Create arrays Alias and Prob, each of size n.
  2. Create two worklists, Small and Large.
  3. Multiply each probability by n.
  4. For each scaled probability pi:
    1. If pi < 1, add i to Small.
    2. Otherwise (pi ≥ 1), add i to Large.
  5. While Small and Large are not empty: (Large might be emptied first)
    1. Remove the first element from Small; call it l.
    2. Remove the first element from Large; call it g.
    3. Set Prob[l]=pl.
    4. Set Alias[l]=g.
    5. Set pg := (pg+pl)−1. (This is a more numerically stable option.)
    6. If pg<1, add g to Small.
    7. Otherwise (pg ≥ 1), add g to Large.
  6. While Large is not empty:
    1. Remove the first element from Large; call it g.
    2. Set Prob[g] = 1.
  7. While Small is not empty: This is only possible due to numerical instability.
    1. Remove the first element from Small; call it l.
    2. Set Prob[l] = 1.

Generation:

  1. Generate a fair die roll from an n-sided die; call the side i.
  2. Flip a biased coin that comes up heads with probability Prob[i].
  3. If the coin comes up "heads," return i.
  4. Otherwise, return Alias[i].

Here is an implementation in Ruby:

def weighted_rand(weights = {})
  raise 'Probabilities must sum up to 1' unless weights.values.inject(&:+) == 1.0
  raise 'Probabilities must not be negative' unless weights.values.all? { |p| p >= 0 }
  # Do more sanity checks depending on the amount of trust in the software component using this method,
  # e.g. don't allow duplicates, don't allow non-numeric values, etc.
  
  # Ignore elements with probability 0
  weights = weights.reject { |k, v| v == 0.0 }   # e.g. => {"a"=>0.4, "b"=>0.4, "c"=>0.2}

  # Accumulate probabilities and map them to a value
  u = 0.0
  ranges = weights.map { |v, p| [u += p, v] }   # e.g. => [[0.4, "a"], [0.8, "b"], [1.0, "c"]]

  # Generate a (pseudo-)random floating point number between 0.0(included) and 1.0(excluded)
  u = rand   # e.g. => 0.4651073966724186
  
  # Find the first value that has an accumulated probability greater than the random number u
  ranges.find { |p, v| p > u }.last   # e.g. => "b"
end

How to use:

weights = {'a' => 0.4, 'b' => 0.4, 'c' => 0.2, 'd' => 0.0}

weighted_rand weights

What to expect roughly:

sample = 1000.times.map { weighted_rand weights }
sample.count('a') # 396
sample.count('b') # 406
sample.count('c') # 198
sample.count('d') # 0

An example in ruby

#each element is associated with its probability
a = {1 => 0.25 ,2 => 0.5 ,3 => 0.2, 4 => 0.05}

#at some point, convert to ccumulative probability
acc = 0
a.each { |e,w| a[e] = acc+=w }

#to select an element, pick a random between 0 and 1 and find the first   
#cummulative probability that's greater than the random number
r = rand
selected = a.find{ |e,w| w>r }

p selected[0]