Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standard error in non-linear regression

I have been doing some Monte Carlo physics simulations with Python and I am in unable to determine the standard error for the coefficients of a non-linear least square fit.

Initially, I was using SciPy's scipy.stats.linregress for my model since I thought it would be a linear model but noticed it is actually some sort of power function. I then used NumPy's polyfit with the degrees of freedom being 2 but I can't find anyway to determine the standard error of the coefficients.

I know gnuplot can determine the errors for me but I need to do fits for over 30 different cases. I was wondering if anyone knows of anyway for Python to read the standard error from gnuplot or is there some other library I can use?

like image 268
syntaxing Avatar asked Aug 19 '11 19:08

syntaxing


People also ask

What is the standard error in regression?

The standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable.

How do you do standard error in regression analysis?

Unlike R-squared, you can use the standard error of the regression to assess the precision of the predictions. Approximately 95% of the observations should fall within plus/minus 2*standard error of the regression from the regression line, which is also a quick approximation of a 95% prediction interval.

Why do we use SSE instead of r2 in nonlinear regression?

Further, R-squared equals SS Regression / SS Total, which mathematically must produce a value between 0 and 100%. In nonlinear regression, SS Regression + SS Error do not equal SS Total! This completely invalidates R-squared for nonlinear models, and it no longer has to be between 0 and 100%.

What is the standard error of the regression slope?

The standard error of the regression slope, s (also called the standard error of estimate) represents the average distance that your observed values deviate from the regression line. The smaller the “s” value, the closer your values are to the regression line.


2 Answers

Finally found the answer to this long asked question! I'm hoping this can at least save someone a few hours of hopeless research for this topic. Scipy has a special function called curve_fit under its optimize section. It uses the least square method to determine the coefficients and best of all, it gives you the covariance matrix. The covariance matrix contains the variance of each coefficient. More exactly, the diagonal of the matrix is the variance and by square rooting the values, the standard error of each coefficient can be determined! Scipy doesn't have much documentation for this so here's a sample code for a better understanding:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plot


def func(x,a,b,c):
    return a*x**2 + b*x + c #Refer [1]

x = np.linspace(0,4,50)
y = func(x,2.6,2,3) + 4*np.random.normal(size=len(x)) #Refer [2] 


coeff, var_matrix = curve_fit(func,x,y)
variance = np.diagonal(var_matrix) #Refer [3]

SE = np.sqrt(variance) #Refer [4]

#======Making a dictionary to print results========
results = {'a':[coeff[0],SE[0]],'b':[coeff[1],SE[1]],'c':[coeff[2],SE[2]]}

print "Coeff\tValue\t\tError"
for v,c in results.iteritems():
    print v,"\t",c[0],"\t",c[1]
#========End Results Printing=================

y2 = func(x,coeff[0],coeff[1],coeff[2]) #Saves the y values for the fitted model

plot.plot(x,y)
plot.plot(x,y2)

plot.show()
  1. What this function returns is critical because it defines what will used to fit for the model
  2. Using the function to create some arbitrary data + some noise
  3. Saves the covariance matrix's diagonal to a 1D matrix which is just a normal array
  4. Square rooting the variance to get the standard error (SE)
like image 173
syntaxing Avatar answered Oct 12 '22 17:10

syntaxing


it looks like gnuplot uses levenberg-marquardt and there's a python implementation available - you can get the error estimates from the mpfit.covar attribute (incidentally, you should worry about what the error estimates "mean" - are other parameters allowed to adjust to compensate, for example?)

like image 30
andrew cooke Avatar answered Oct 12 '22 17:10

andrew cooke