Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: how can I generate numbers according to an expected distribution?

Sometimes Scala's random doesn't help when wanting to generate random numbers of smaller ranges or when you already know that certain numbers already have an associated probability attached to them. Obviously, scala.util.Random.nextInt won't get the whole job done.

How do I select numbers according to their weights?

like image 641
crockpotveggies Avatar asked Jul 21 '14 15:07

crockpotveggies


People also ask

How do you generate a random number from a given distribution?

Let P(X) be the probability that random number generated according to your distribution is less than X. You start with generating uniform random X between zero and one. After that you find Y such that P(Y) = X and output Y. You could find such Y using binary search (since P(X) is an increasing function of X).

How do you generate random numbers in Scala?

To generate the Random numbers over Scala we use the Random function with the Scala. util. Random class. The Random number generated can be anything be it Integer, Float, Double, char.


1 Answers

Here's a simple explanation. Imagine you have multinomial over the values A,B,C where they have the following probabilities:

  • A = 0.5
  • B = 0.3
  • C = 0.2

If you want to sample a value according to its probability, then that means that roughly 50% of the time you want to get A, 30% of the time B, and 20% of the time C.

So imagine your distribution as a broken stick:

       A         B      C
      0.5       0.3    0.2
|------------|-------|-----|
0           0.5     0.8   1.0

The procedure for sampling from the multinomial starts with a random value p sampled uniformly between 0 and 1. Then you check which portion of the stick p falls in, and return the corresponding value.

So if p = 0.7, then:

       A         B      C
      0.5       0.3    0.2
|------------|-------|-----|
0           0.5     0.8   1.0
                  ^
                 p=0.7

and you would return B.

Code-wise, then would look like:

final def sample[A](dist: Map[A, Double]): A = {
  val p = scala.util.Random.nextDouble
  val it = dist.iterator
  var accum = 0.0
  while (it.hasNext) {
    val (item, itemProb) = it.next
    accum += itemProb
    if (accum >= p)
      return item  // return so that we don't have to search through the whole distribution
  }
  sys.error(f"this should never happen")  // needed so it will compile
}

And you can check it like this:

val dist = Map('A -> 0.5, 'B -> 0.3, 'C -> 0.2)
sample(dist)  // 'A

Vector.fill(1000)(sample(dist)).groupBy(identity).mapValues(_.size) // Map('A -> 510, 'B -> 300, 'C -> 190)

Other things:

  • If dist is not a probability distribution (ie, weights don't sum to one), then you would simply use p = nextDouble * dist.values.sum. So if dist summed to 0.5, then p would be uniform between 0.0 and 0.5; if it summed to 20, then p would be uniform between 0.0 and 20.0.

There are other optimizations you can do, like sorting the entries with the largest probabilities first so that you minimize the number of entries you have to look through before you accumulate p probability mass, but this should get you through the basic idea.

like image 159
dhg Avatar answered Oct 28 '22 17:10

dhg