How to compute standard deviation errors with scipy.optimize.least_squares

Tags:

I compare fitting with optimize.curve_fit and optimize.least_squares. With curve_fit I get the covariance matrix pcov as an output and I can calculate the standard deviation errors for my fitted variables by that:

perr = np.sqrt(np.diag(pcov))

If I do the fitting with least_squares, I do not get any covariance matrix output and I am not able to calculate the standard deviation errors for my variables.

Here's my example:

#import modules
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import least_squares

noise = 0.5
N = 100
t = np.linspace(0, 4*np.pi, N)

# generate data
def generate_data(t, freq, amplitude, phase, offset, noise=0, n_outliers=0, random_state=0):
    #formula for data generation with noise and outliers
    y = np.sin(t * freq + phase) * amplitude + offset
    rnd = np.random.RandomState(random_state)
    error = noise * rnd.randn(t.size)
    outliers = rnd.randint(0, t.size, n_outliers)
    error[outliers] *= 10
    return y + error

#generate data
data = generate_data(t, 1, 3, 0.001, 0.5, noise, n_outliers=10)

#initial guesses
p0=np.ones(4)
x0=np.ones(4)

# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
    return np.sin(x * freq + phase) * amplitude + offset

# create the function we want to fit for least-square
def my_sin_lsq(x, t, y):
    # freq=x[0]
    # phase=x[1]
    # amplitude=x[2]
    # offset=x[3]
    return (np.sin(t*x[0]+x[2])*x[1]+ x[3]) - y

# now do the fit for curve_fit
fit = curve_fit(my_sin, t, data, p0=p0)
print 'Curve fit output:'+str(fit[0])

#now do the fit for least_square
res_lsq = least_squares(my_sin_lsq, x0, args=(t, data))
print 'Least_squares output:'+str(res_lsq.x)


# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)

#data_first_guess_lsq = x0[2]*np.sin(t*x0[0]+x0[1])+x0[3]
data_first_guess_lsq = my_sin(t, *x0)

# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
data_fit_lsq = my_sin(t, *res_lsq.x)

#calculation of residuals
residuals = data - data_fit
residuals_lsq = data - data_fit_lsq
ss_res = np.sum(residuals**2)
ss_tot = np.sum((data-np.mean(data))**2)
ss_res_lsq = np.sum(residuals_lsq**2)
ss_tot_lsq = np.sum((data-np.mean(data))**2)

#R squared
r_squared = 1 - (ss_res/ss_tot)
r_squared_lsq = 1 - (ss_res_lsq/ss_tot_lsq)
print 'R squared curve_fit is:'+str(r_squared)
print 'R squared least_squares is:'+str(r_squared_lsq)

plt.figure()
plt.plot(t, data)
plt.title('curve_fit')
plt.plot(t, data_first_guess)
plt.plot(t, data_fit)
plt.plot(t, residuals)

plt.figure()
plt.plot(t, data)
plt.title('lsq')
plt.plot(t, data_first_guess_lsq)
plt.plot(t, data_fit_lsq)
plt.plot(t, residuals_lsq)

#error
perr = np.sqrt(np.diag(fit[1]))
print 'The standard deviation errors for curve_fit are:' +str(perr)

I would be very thankful for any help, best wishes

ps: I got a lot of input from this source and used part of the code Robust regression

319

asked Feb 22 '17 09:02

strohfelder

2 Answers

The result of optimize.least_squares has a parameter inside of it called jac. From the documentation:

jac : ndarray, sparse matrix or LinearOperator, shape (m, n)

Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm.

This can be used to estimate the Covariance Matrix of the parameters using the following formula: Sigma = (J'J)^-1.

J = res_lsq.jac
cov = np.linalg.inv(J.T.dot(J))

To find the variance of the parameters one can then use:

var = np.sqrt(np.diagonal(cov))

132

answered Sep 18 '22 16:09

Alex Shmakov

The SciPy program optimize.least_squares requires the user to provide in input a function fun(...) which returns a vector of residuals. This is typically defined as

residuals = (data - model)/sigma

where data and model are vectors with the data to fit and the corresponding model predictions for each data point, while sigma is the 1σ uncertainty in each data value.

In this situation, and assuming one can trust the input sigma uncertainties, one can use the output Jacobian matrix jac returned by least_squares to estimate the covariance matrix. Moreover, assuming the covariance matrix is diagonal, or simply ignoring non-diagonal terms, one can also obtain the 1σ uncertainty perr in the model parameters (often called "formal errors") as follows (see Section 15.4.2 of Numerical Recipes 3rd ed.)

import numpy as np
from scipy import linalg, optimize

res = optimize.least_squares(...)

U, s, Vh = linalg.svd(res.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(res.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) @ Vh[w]  # robust covariance matrix
perr = np.sqrt(np.diag(cov))     # 1sigma uncertainty on fitted parameters

The above code to obtain the covariance matrix is formally the same as the following simpler one (as suggested by Alex), but the above has the major advantage that it works even when the Jacobian is close to degenerate, which is a common occurrence in real-world least-squares fits

cov = linalg.inv(res.jac.T @ res.jac)  # covariance matrix when jac not degenerate

If one does not trust the input uncertainties sigma, one can still assume that the fit is good, to estimate the data uncertainties from the fit itself. This corresponds to assuming chi**2/DOF=1, where DOF is the number of degrees of freedom. In this case, one can use the following lines to rescale the covariance matrix before computing the uncertainties

chi2dof = np.sum(res.fun**2)/(res.fun.size - res.x.size)
cov *= chi2dof
perr = np.sqrt(np.diag(cov))    # 1sigma uncertainty on fitted parameters

answered Sep 21 '22 16:09

divenex

Related questions
                            
                                Interactive boxplot with pandas and Jupyter notebook
                            
                                Converting time to epoch (Python) [duplicate]
                            
                                Search and replace for text within a pdf, in Python
                            
                                How to create a table as select in pyspark.sql
                            
                                How to set single element of multi dimensional Numpy Array using another Numpy array?
                            
                                How to handle variable length sublist unpacking in Python2?
                            
                                Why does this Python subprocess command only work when shell=True on Windows?
                            
                                PyQt Event when a variable value is changed
                            
                                Upsample and Interpolate a NumPy Array
                            
                                selecting a specific value from a data frame
                            
                                Read from a large file without loading whole thing into memory using h5py
                            
                                Add multiple values to one numpy array index
                            
                                How to pass keyword argument to function called by concurrent.futures map call
                            
                                Get saved object of a model form in Django?
                            
                                PyQt5 setText by object name?
                            
                                django error on migration: "There is no unique constraint matching given keys for referenced table
                            
                                TensorFlow FileWriter not writing to file
                            
                                Google AppEngine Endpoints Error: Fetching service config failed (status code 404)
                            
                                Get names of positional arguments from function's signature
                            
                                Keras jupyter notebook outputs blocks during training

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compute standard deviation errors with scipy.optimize.least_squares

Tags:

python

scipy

least-squares

strohfelder

People also ask

2 Answers

Alex Shmakov

divenex

Recent Activity

Donate For Us