Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Apache Commons Math to determine confidence intervals

I have a set of benchmark data for which I compute summary statistics using Apache Math Commons. Now I want to use the package to compute confidence intervals for the arithmetic means of e.g. running time measurements.

Is this possible at all? I am convinced that the package supports this, however I am at a loss about where to start.

This is the solution I ended up using with the help of Brent Worden's suggestion:

private double getConfidenceIntervalWidth(StatisticalSummary statistics, double significance) {
    TDistribution tDist = new TDistribution(statistics.getN() - 1);
    double a = tDist.inverseCumulativeProbability(1.0 - significance / 2);
    return a * statistics.getStandardDeviation() / Math.sqrt(statistics.getN());
}
like image 406
Jannik Jochem Avatar asked Apr 06 '11 10:04

Jannik Jochem


People also ask

How do you determine a 95% confidence interval?

where the value of z is appropriate for the confidence level. For a 95% confidence interval, we use z=1.96, while for a 90% confidence interval, for example, we use z=1.64. Pr(−z<Z<z)=C100,whe re Zd=N(0,1).

How do you calculate confidence intervals?

Compute the standard error as σ/√n = 0.5/√100 = 0.05 . Multiply this value by the z-score to obtain the margin of error: 0.05 × 1.959 = 0.098 . Add and subtract the margin of error from the mean value to obtain the confidence interval. In our case, the confidence interval is between 2.902 and 3.098.

How do you find the confidence interval in machine learning?

Step 1: Identify the sample problem. Choose the statistic (like sample mean, etc) that you will use to estimate population parameter. Step 2: Select a confidence level. (Usually, it is 90%, 95% or 99%) Step 3: Find the margin of error.


1 Answers

Apache Commons Math does not have direct support for constructing confidence intervals. However, it does have everything needed to compute them.

First, use SummaryStatistics, or some other StatisticalSummary implementation to summarize your data into sample statistics.

Next, use TDistribution to compute critical values for your desired confidence level. The degrees of freedom can be inferred from the summary statistics' n property.

Last, use the mean, variance, and n property values from the summary statistics and the t critical value from the distribution to compute your lower and upper confidence limits.

like image 171
Brent Worden Avatar answered Oct 04 '22 13:10

Brent Worden