Sometimes Scala's random doesn't help when wanting to generate random numbers of smaller ranges or when you already know that certain numbers already have an associated probability attached to them. Obviously, scala.util.Random.nextInt
won't get the whole job done.
How do I select numbers according to their weights?
Let P(X) be the probability that random number generated according to your distribution is less than X. You start with generating uniform random X between zero and one. After that you find Y such that P(Y) = X and output Y. You could find such Y using binary search (since P(X) is an increasing function of X).
To generate the Random numbers over Scala we use the Random function with the Scala. util. Random class. The Random number generated can be anything be it Integer, Float, Double, char.
Here's a simple explanation. Imagine you have multinomial over the values A,B,C where they have the following probabilities:
If you want to sample a value according to its probability, then that means that roughly 50% of the time you want to get A, 30% of the time B, and 20% of the time C.
So imagine your distribution as a broken stick:
A B C
0.5 0.3 0.2
|------------|-------|-----|
0 0.5 0.8 1.0
The procedure for sampling from the multinomial starts with a random value p
sampled uniformly between 0 and 1. Then you check which portion of the stick p
falls in, and return the corresponding value.
So if p = 0.7
, then:
A B C
0.5 0.3 0.2
|------------|-------|-----|
0 0.5 0.8 1.0
^
p=0.7
and you would return B
.
Code-wise, then would look like:
final def sample[A](dist: Map[A, Double]): A = {
val p = scala.util.Random.nextDouble
val it = dist.iterator
var accum = 0.0
while (it.hasNext) {
val (item, itemProb) = it.next
accum += itemProb
if (accum >= p)
return item // return so that we don't have to search through the whole distribution
}
sys.error(f"this should never happen") // needed so it will compile
}
And you can check it like this:
val dist = Map('A -> 0.5, 'B -> 0.3, 'C -> 0.2)
sample(dist) // 'A
Vector.fill(1000)(sample(dist)).groupBy(identity).mapValues(_.size) // Map('A -> 510, 'B -> 300, 'C -> 190)
Other things:
dist
is not a probability distribution (ie, weights don't sum to one), then you would simply use p = nextDouble * dist.values.sum
. So if dist
summed to 0.5, then p
would be uniform between 0.0 and 0.5; if it summed to 20, then p
would be uniform between 0.0 and 20.0.There are other optimizations you can do, like sorting the entries with the largest probabilities first so that you minimize the number of entries you have to look through before you accumulate p
probability mass, but this should get you through the basic idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With