I have a set of integers each of which has a probability assigned, derived from earlier experiments, e.g.:
0 = 0.5
1 = 0.2
2 = 0.3
Complying with the specifications of a probability distribution, these weights sum up to 1.0. I am now looking for an efficient way to sample one of the values while taking the given probabilities into account, e.g. (pseude-code):
Distribution distribution = new DiscreteDistribution(new double[]{0.5, 0.3, 0.2});
distribution.sample();
This should result in 0 half of the time according to the given numbers. However, do not assume any patterns or regularities among these.
I've been using Apache Commons Math for my previous experiments, but it does not seem to provide a solution for this scenario, neither does Colt.
I wonder whether this is because I've missed an easy solution. A naive implemententation seems more or less straight-forward, but doing this efficiently is rather involved. That is why I am looking for an established implementation.
A discrete probability distribution counts occurrences that have countable or finite outcomes. This is in contrast to a continuous distribution, where outcomes can fall anywhere on a continuum. Common examples of discrete distribution include the binomial, Poisson, and Bernoulli distributions.
To simulate probability in Java, the first thing we need to do is to generate random numbers. Fortunately, Java provides us with plenty of random numbers generators. Here, we drew numbers from 1 to 100. The chance for our random number to be lesser or equal to 50 is exactly 50%.
A discrete distribution is a distribution of data in statistics that has discrete values. Discrete values are countable, finite, non-negative integers, such as 1, 10, 15, etc.
Given the simplicity of the quantile function and the triviality of a manual implementation, I don't see any harm in writing this out explicitly.
Once you've drawn your random number r
in [0, 1), use
if (r <= 0.5/*micro-optimisation: most likely case first*/){
return 0;
} else if (r <= 0.8/*then the next most likely case*/){
return 2;
} else {
return 1;
}
Perhaps things get a little more fancy for more than 3 numbers, consider building up a table to represent the quantile function in such cases, at the expense of some degradation in performance.
(It would be difficult to beat my solution in terms of speed, in the worst case you have a couple of branches - and you're helping a branch predictor in the nicest way you possibly can, and the random number drawing will be where the performance bottleneck is).
A very simple generic solution would be:
class Distribution<T>{
List<Double> probs = new ArrayList<>();
List<T> events = new ArrayList<>();
double sumProb;
Random rand = new Random();
Distribution(Map<T,Double> probs){
for(T event : probs.keySet()){
sumProb += probs.get(event);
events.add(event);
this.probs.add(probs.get(event));
}
}
public T sample(){
T value;
double prob = rand.nextDouble()*sumProb;
int i;
for(i=0; prob>0; i++){
prob-= probs.get(i);
}
return events.get(i-1);
}
}
Feel free to change it, as you need it, e.g. with adding other constructors. Of course here is a lot of stuff to improve, starting with the efficiency, but it is something you can reuse later a lot.
Calling Random.nextDouble()
is a fairly expensive operation. You are better off using Random.nextInt(n)
in this case
int num = rand.nextInt(10);
return num <= 5 ? 0 : num <= 8 ? 1 : 2;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With