Scala: how can I generate numbers according to an expected distribution?

Tags:

Sometimes Scala's random doesn't help when wanting to generate random numbers of smaller ranges or when you already know that certain numbers already have an associated probability attached to them. Obviously, scala.util.Random.nextInt won't get the whole job done.

How do I select numbers according to their weights?

641

asked Jul 21 '14 15:07

crockpotveggies

1 Answers

Here's a simple explanation. Imagine you have multinomial over the values A,B,C where they have the following probabilities:

A = 0.5
B = 0.3
C = 0.2

If you want to sample a value according to its probability, then that means that roughly 50% of the time you want to get A, 30% of the time B, and 20% of the time C.

So imagine your distribution as a broken stick:

       A         B      C
      0.5       0.3    0.2
|------------|-------|-----|
0           0.5     0.8   1.0

The procedure for sampling from the multinomial starts with a random value p sampled uniformly between 0 and 1. Then you check which portion of the stick p falls in, and return the corresponding value.

So if p = 0.7, then:

       A         B      C
      0.5       0.3    0.2
|------------|-------|-----|
0           0.5     0.8   1.0
                  ^
                 p=0.7

and you would return B.

Code-wise, then would look like:

final def sample[A](dist: Map[A, Double]): A = {
  val p = scala.util.Random.nextDouble
  val it = dist.iterator
  var accum = 0.0
  while (it.hasNext) {
    val (item, itemProb) = it.next
    accum += itemProb
    if (accum >= p)
      return item  // return so that we don't have to search through the whole distribution
  }
  sys.error(f"this should never happen")  // needed so it will compile
}

And you can check it like this:

val dist = Map('A -> 0.5, 'B -> 0.3, 'C -> 0.2)
sample(dist)  // 'A

Vector.fill(1000)(sample(dist)).groupBy(identity).mapValues(_.size) // Map('A -> 510, 'B -> 300, 'C -> 190)

Other things:

If dist is not a probability distribution (ie, weights don't sum to one), then you would simply use p = nextDouble * dist.values.sum. So if dist summed to 0.5, then p would be uniform between 0.0 and 0.5; if it summed to 20, then p would be uniform between 0.0 and 20.0.

There are other optimizations you can do, like sorting the entries with the largest probabilities first so that you minimize the number of entries you have to look through before you accumulate p probability mass, but this should get you through the basic idea.

159

answered Oct 28 '22 17:10

dhg

Related questions
                            
                                Ideal functional language for implementing a full text search with .NET [closed]
                            
                                Scala Monad - Full worked example [closed]
                            
                                Read property file under classpath using scala
                            
                                Spray/Akka missing implicit
                            
                                filter a List according to multiple contains
                            
                                Flatten a list of tuples in Scala?
                            
                                Error: not found: value lit/when - spark scala
                            
                                Spark : Average of values instead of sum in reduceByKey using Scala
                            
                                Scala: how to inherit a "static slot"?
                            
                                Implementing a Measured value in Scala
                            
                                Adding two Set[Any]
                            
                                What are the performance and maintenance considerations of using Scala's structural types?
                            
                                Backquote Used in Scala Swing Event
                            
                                How would I represent a compound key in Scala?
                            
                                What does _ :: mean in Scala?
                            
                                how to avoid race conditions with scala parallel collections
                            
                                Scala, Extend object with a generic trait
                            
                                Local dependencies resolved by SBT but not by Play! Framework
                            
                                Scala: How can I split a String into a Map
                            
                                Scala: Does Option[Boolean] makes sense?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scala: how can I generate numbers according to an expected distribution?

Tags:

random

scala

distribution

crockpotveggies

People also ask

1 Answers

dhg

Recent Activity

Donate For Us