Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

confidence interval with leastsq fit in scipy python

How to calculate confidence interval for the least square fit (scipy.optimize.leastsq) in python?

like image 850
casper Avatar asked Apr 27 '11 21:04

casper


People also ask

How do you create a 95 confidence interval in Python?

Create a new sample based on our dataset, with replacement and with the same number of points. Calculate the mean value and store it in an array or list. Repeat the process many times (e.g. 1000) On the list of the mean values, calculate 2.5th percentile and 97.5th percentile (if you want a 95% confidence interval)

Which Python function will give the 95% confidence interval?

ppf function. The arguments for t. ppf() are q = percentage, df = degree of freedom, scale = std dev, loc = mean. As t-distribution is symmetric for a 95% confidence interval q will be 0.975.

How do you find the confidence interval for variance in Python?

This approach is used to calculate confidence Intervals for the large dataset where the n>30 and for this, the user needs to call the norm. interval() function from the scipy. stats library to get the confidence interval for a population means of the given dataset where the dataset is normally distributed in python.


2 Answers

I would use bootstrapping method.
See here: http://phe.rockefeller.edu/LogletLab/whitepaper/node17.html

Simple example for noisy gaussian:

x = arange(-10, 10, 0.01)

# model function
def f(p):
    mu, s = p
    return exp(-(x-mu)**2/(2*s**2))

# create error function for dataset    
def fff(d):
    def ff(p):
        return d-f(p)
    return ff

# create noisy dataset from model
def noisy_data(p):
    return f(p)+normal(0,0.1,len(x))

# fit dataset to model with least squares    
def fit(d):
    ff = fff(d)
    p = leastsq(ff,[0,1])[0]
    return p

# bootstrap estimation        
def bootstrap(d):
    p0 = fit(d)
    residuals = f(p0)-d
    s_residuals = std(residuals)

    ps = []
    for i in range(1000):
        new_d = d+normal(0,s_residuals,len(d))
        ps.append(fit(new_d))

    ps = array(ps)
    mean_params = mean(ps,0)
    std_params = std(ps,0)

    return mean_params, std_params

data = noisy_data([0.5, 2.1])
mean_params, std_params = bootstrap(data)

print "95% confidence interval:"
print "mu: ", mean_params[0], " +/- ", std_params[0]*1.95996
print "sigma: ", mean_params[1], " +/- ", std_params[1]*1.95996
like image 114
so12311 Avatar answered Sep 30 '22 08:09

so12311


I am not sure what you mean by confidence interval.

In general, leastsq doesn't know much about the function that you are trying to minimize, so it can't really give a confidence interval. However, it does return an estimate of the Hessian, in other word the generalization of 2nd derivatives to multidimensional problems.

As hinted in the docstring of the function, you could use that information along with the residuals (the difference between your fitted solution and the actual data) to computed the covariance of parameter estimates, which is a local guess of the confidence interval.

Note that it is only a local information, and I suspect that you can strictly speaking come to a conclusion only if your objective function is strictly convex. I don't have any proofs or references on that statement :).

like image 45
Gael Varoquaux Avatar answered Sep 30 '22 08:09

Gael Varoquaux