Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SciPy LeastSq Goodness of Fit Estimator

I have a data surface that I'm fitting using SciPy's leastsq function.

I would like to have some estimate of the quality of the fit after leastsq returns. I'd expected that this would be included as a return from the function, but, if so, it doesn't seem to be clearly documented.

Is there such a return or, barring that, some function I can pass my data and the returned parameter values and fit function to that will give me an estimate of fit quality (R^2 or some such)?

Thanks!

like image 772
Richard Avatar asked Sep 28 '11 19:09

Richard


People also ask

What does SciPy optimize Leastsq do?

leastsq. Minimize the sum of squares of a set of equations. Should take at least one (possibly length N vector) argument and returns M floating point numbers.

How does curve_fit work SciPy?

The SciPy open source library provides the curve_fit() function for curve fitting via nonlinear least squares. The function takes the same input and output data as arguments, as well as the name of the mapping function to use. The mapping function must take examples of input data and some number of arguments.

What is POPT and PCOV?

What does popt and pcov mean? popt- An array of optimal values for the parameters which minimizes the sum of squares of residuals. pcov-2d array which contains the estimated covariance of popt. The diagonals provide the variance of the parameter estimate.

What does SciPy optimize curve_fit?

The SciPy API provides a 'curve_fit' function in its optimization library to fit the data with a given function. This method applies non-linear least squares to fit the data and extract the optimal parameters out of it.


1 Answers

If you call leastsq like this:

import scipy.optimize
p,cov,infodict,mesg,ier = optimize.leastsq(
        residuals,a_guess,args=(x,y),full_output=True)

where

def residuals(a,x,y):
    return y-f(x,a)

then, using the definition of R^2 given here,

ss_err=(infodict['fvec']**2).sum()
ss_tot=((y-y.mean())**2).sum()
rsquared=1-(ss_err/ss_tot)

What is infodict['fvec'] you ask? It's the array of residuals:

In [48]: optimize.leastsq?
...
      infodict -- a dictionary of optional outputs with the keys:
                  'fvec' : the function evaluated at the output

For example:

import scipy.optimize as optimize
import numpy as np
import collections
import matplotlib.pyplot as plt

x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])

def sigmoid(p,x):
    x0,y0,c,k=p
    y = c / (1 + np.exp(-k*(x-x0))) + y0
    return y

def residuals(p,x,y):
    return y - sigmoid(p,x)

Param=collections.namedtuple('Param','x0 y0 c k')
p_guess=Param(x0=600,y0=200,c=100,k=0.01)
p,cov,infodict,mesg,ier = optimize.leastsq(
    residuals,p_guess,args=(x,y),full_output=True)
p=Param(*p)
xp = np.linspace(100, 1600, 1500)
print('''\
x0 = {p.x0}
y0 = {p.y0}
c = {p.c}
k = {p.k}
'''.format(p=p))
pxp=sigmoid(p,xp)

# You could compute the residuals this way:
resid=residuals(p,x,y)
print(resid)
# [ 0.76205302 -2.010142    2.60265297 -3.02849144  1.6739274 ]

# But you don't have to compute `resid` -- `infodict['fvec']` already
# contains the info.
print(infodict['fvec'])
# [ 0.76205302 -2.010142    2.60265297 -3.02849144  1.6739274 ]

ss_err=(infodict['fvec']**2).sum()
ss_tot=((y-y.mean())**2).sum()
rsquared=1-(ss_err/ss_tot)
print(rsquared)
# 0.996768131959

plt.plot(x, y, '.', xp, pxp, '-')
plt.xlim(100,1000)
plt.ylim(130,270)
plt.xlabel('x')
plt.ylabel('y',rotation='horizontal')
plt.grid(True)
plt.show()

enter image description here

like image 163
unutbu Avatar answered Sep 23 '22 16:09

unutbu