How to estimate goodness-of-fit using scipy.odr?

Tags:

I am fitting data with weights using scipy.odr but I don't know how to obtain a measure of goodness-of-fit or an R squared. Does anyone have suggestions for how to obtain this measure using the output stored by the function?

400

asked Jan 28 '14 01:01

Ashley

2 Answers

The res_var attribute of the Output is the so-called reduced Chi-square value for the fit, a popular choice of goodness-of-fit statistic. It is somewhat problematic for non-linear fitting, though. You can look at the residuals directly (out.delta for the X residuals and out.eps for the Y residuals). Implementing a cross-validation or bootstrap method for determining goodness-of-fit, as suggested in the linked paper, is left as an exercise for the reader.

answered Oct 12 '22 22:10

Robert Kern

The output of ODR gives both the estimated parameters beta as well as the standard deviation of those parameters sd_beta. Following p. 76 of the ODRPACK documentation, you can convert these values into a t-statistic with (beta - beta_0) / sd_beta, where beta_0 is the number that you're testing significance with respect to (often zero). From there, you can use the t-distribution to get the p-value.

Here's a working example:

import numpy as np
from scipy import stats, odr


def linear_func(B, x):
    """
    From https://docs.scipy.org/doc/scipy/reference/odr.html
    Linear function y = m*x + b
    """
    # B is a vector of the parameters.
    # x is an array of the current x values.
    # x is in the same format as the x passed to Data or RealData.
    #
    # Return an array in the same format as y passed to Data or RealData.
    return B[0] * x + B[1]


np.random.seed(0)
sigma_x = .1
sigma_y = .15
N = 100
x_star = np.linspace(0, 10, N)
x = np.random.normal(x_star, sigma_x, N)
# the true underlying function is y = 2*x_star + 1
y = np.random.normal(2*x_star + 1, sigma_y, N)

linear = odr.Model(linear_func)
dat = odr.Data(x, y, wd=1./sigma_x**2, we=1./sigma_y**2)
this_odr = odr.ODR(dat, linear, beta0=[1., 0.])
odr_out = this_odr.run()
# degrees of freedom are n_samples - n_parameters
df = N - 2  # equivalently, df = odr_out.iwork[10]
beta_0 = 0  # test if slope is significantly different from zero
t_stat = (odr_out.beta[0] - beta_0) / odr_out.sd_beta[0]  # t statistic for the slope parameter
p_val = stats.t.sf(np.abs(t_stat), df) * 2
print('Recovered equation: y={:3.2f}x + {:3.2f}, t={:3.2f}, p={:.2e}'.format(odr_out.beta[0], odr_out.beta[1], t_stat, p_val))

Recovered equation: y=2.00x + 1.01, t=239.63, p=1.76e-137

One note of caution in using this approach on nonlinear problems, from the same ODRPACK docs:

"Note that for nonlinear ordinary least squares, the linearized confidence regions and intervals are asymptotically correct as n → ∞ [Jennrich, 1969]. For the orthogonal distance regression problem, they have been shown to be asymptotically correct as σ∗ → 0 [Fuller, 1987]. The difference between the conditions of asymptotic correctness can be explained by the fact that, as the number of observations increases in the orthogonal distance regression problem one does not obtain additional information for ∆. Note also that Vˆ is dependent upon the weight matrix Ω, which must be assumed to be correct, and cannot be confirmed from the orthogonal distance regression results. Errors in the values of wǫi and wδi that form Ω will have an adverse affect on the accuracy of Vˆ and its component parts. The results of a Monte Carlo experiment examining the accuracy of the linearized confidence intervals for four different measurement error models is presented in [Boggs and Rogers, 1990b]. Those results indicate that the confidence regions and intervals for ∆ are not as accurate as those for β.

Despite its potential inaccuracy, the covariance matrix is frequently used to construct confidence regions and intervals for both nonlinear ordinary least squares and measurement error models because the resulting regions and intervals are inexpensive to compute, often adequate, and familiar to practitioners. Caution must be exercised when using such regions and intervals, however, since the validity of the approximation will depend on the nonlinearity of the model, the variance and distribution of the errors, and the data itself. When more reliable intervals and regions are required, other more accurate methods should be used. (See, e.g., [Bates and Watts, 1988], [Donaldson and Schnabel, 1987], and [Efron, 1985].)"

answered Oct 12 '22 22:10

alowet

Related questions
                            
                                Data compression in python/numpy
                            
                                Quantile/Median/2D binning in Python
                            
                                Python scipy.optimize: Using fsolve with multiple first guesses
                            
                                How to do nonlinear complex root finding in Python
                            
                                Performing analysis of covariance with python/scipy/statsmodel
                            
                                How to self-reference column in pandas Data Frame?
                            
                                Scipy sparse matrices element wise multiplication
                            
                                Can't get scipy.io.wavfile.read() to work
                            
                                How to interpret the upper/lower bound of a datapoint with confidence intervals?
                            
                                Hilbert Transform in Python?
                            
                                Finding the length of a cubic B-spline
                            
                                merging indexed array in Python
                            
                                Double or float - optimization routines
                            
                                Name of this algorithm, and is there a numpy/scipy implementation of it?
                            
                                Python/Cython: Using SciPy with Cython
                            
                                Multiplying elements in a sparse array with rows in matrix
                            
                                Correct way to use scipy.signal.spectral.lombscargle
                            
                                What is `scipy.i`?
                            
                                (Python) Estimating regression parameter confidence intervals with scikits bootstrap
                            
                                matshow with sparse matrices

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to estimate goodness-of-fit using scipy.odr?

Tags:

scipy

regression

orthogonal

Ashley

People also ask

2 Answers

Robert Kern

alowet

Recent Activity

Donate For Us