I am looking for a java library / implementation which supports the calculation of the inverse cumulative distribution function for the beta distribution (aka estimation of quantiles) with reasonable precision.
Of course I have tried apache commons math, but in version 3 there still seem to be some issues with the precision. Below the problem which lead to this question is described extensively.
Suppose I want to calculate the credible interval of a beta distribution with a lot of trials. In apache commons math ...
final int trials = 161750;
final int successes = 10007;
final double alpha = 0.05d;
// the supplied precision is the default precision according to the source code
BetaDistribution betaDist = new BetaDistribution(successes + 1, trials - successes + 1, 1e-9);
System.out.println("2.5 percentile :" + betaDist.inverseCumulativeProbability(alpha / 2d));
System.out.println("mean: " + betaDist.getNumericalMean());
System.out.println("median: " + betaDist.inverseCumulativeProbability(0.5));
System.out.println("97.5 percentile :" + betaDist.inverseCumulativeProbability(1 - alpha / 2d));
which delivers
2.5 percentile :0.062030402074808505
mean: 0.06187249616697166
median: 0.062030258659508855
97.5 percentile :0.06305170793994147
The issues is that the 2.5 percentile and median are the same meanwhile both greater than the mean.
In comparison, the R-package binom delivers
binom.confint(10007+1,161750+2,methods=c("agresti-coull","exact","wilson"))
method x n mean lower upper
1 agresti-coull 10008 161752 0.0618725 0.06070873 0.06305707
2 exact 10008 161752 0.0618725 0.06070317 0.06305756
3 wilson 10008 161752 0.0618725 0.06070877 0.06305703
and the R-package stats
qbeta(c(0.025,0.975),10007+1,161750-10007+1)
[1] 0.06070355 0.06305171
To second the results from R, here is what Wolfram Alpha told me
Final note on the requirements:
Update 21.08.12
It seems that the issue has been fixed or at least improved in 3.1-SNAPSHOT of apache-commons-math. For the usecase above
2.5 percentile :0.06070354581340706
mean: 0.06187249616697166
median: 0.06187069085946604
97.5 percentile :0.06305170793994147
Update 23.02.13
While at first glance this question and it's responses may be too localized, I think that it very well illustrates that some numerical problems cannot be solved (efficiently) with a what-first-comes-to-mind-hacker-approach. So I hope it remains open.
The issue has been fixed in apache commons math 3.1.1
The testcase above delivered
2.5 percentile :0.06070354581334864
mean: 0.06187249616697166
median: 0.06187069085930821
97.5 percentile :0.0630517079399996
which matches the results from the r-package stats. Extensive application of 3.1-SNAPSHOT + x versions also did not cause any problems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With