scipy.curve_fit vs. numpy.polyfit different covariance matrices

Tags:

I am using Python 3.6 for data fitting. Recently, I came across the following problem and I’m lacking experience wherefore I’m not sure how to deal with this.

If I use numpy.polyfit(x, y, 1, cov=True) and scipy.curve_fit(lambda: x, a, b: a*x+b, x, y) on the same set of data points, I get nearly the same coefficients a and b. But the values of the covariance matrix of scipy.curve_fit are roughly half of the values from numpy.polyfit.

Since I want to use the diagonal of the covariance matrix to estimate the uncertainties (u = numpy.sqrt(numpy.diag(cov))) of the coefficients, I have three questions:

Which covariance matrix is the right one (Which one should I use)?
Why is there a difference?
What does it need to make them equal?

Thanks!

Edit:

import numpy as np
import scipy.optimize as sc

data = np.array([[1,2,3,4,5,6,7],[1.1,1.9,3.2,4.3,4.8,6.0,7.3]]).T

x=data[:,0]
y=data[:,1]

A=np.polyfit(x,y,1, cov=True)
print('Polyfit:', np.diag(A[1]))

B=sc.curve_fit(lambda x,a,b: a*x+b, x, y)
print('Curve_Fit:', np.diag(B[1]))

If I use the statsmodels.api, the result corresponds to that of curve_fit.

589

asked Aug 23 '18 07:08

user168000

1 Answers

I imagine it has something to do with this

593          # Some literature ignores the extra -2.0 factor in the denominator, but 
594          #  it is included here because the covariance of Multivariate Student-T 
595          #  (which is implied by a Bayesian uncertainty analysis) includes it. 
596          #  Plus, it gives a slightly more conservative estimate of uncertainty. 
597          if len(x) <= order + 2: 
598              raise ValueError("the number of data points must exceed order + 2 " 
599                               "for Bayesian estimate the covariance matrix") 
600          fac = resids / (len(x) - order - 2.0) 
601          if y.ndim == 1: 
602              return c, Vbase * fac 
603          else: 
604              return c, Vbase[:,:, NX.newaxis] * fac

As in this case len(x) - order is 4 and (len(x) - order - 2.0) is 2, that would explain why your values are different by a factor of 2.

This explains question 2. The answer to question 3 is likely "get more data.", as for larger len(x) the difference will probably be negligible.

Which formulation is correct (question 1) is probably a question for Cross Validated, but I'd assume it is is curve_fit as that is explicitly intended to calculate the uncertainties as you state. From the documentation

pcov : 2d array

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).

While the comment in the code for polyfit above says its intetention is more for Student-T analysis.

answered Sep 27 '22 20:09

Daniel F

Related questions
                            
                                Use an instance method as a decorator within another class
                            
                                Gradient descent impementation python - contour lines
                            
                                Welch's ANOVA in Python
                            
                                pygame error: "ImportError: No module named 'pygame'"
                            
                                Can't log in super users created on the django admin backend
                            
                                How to generates a list which elements are at a fix distance from a desired list
                            
                                Trouble downloading images using scrapy
                            
                                python .exe not working properly
                            
                                Intersection 3D meshes python
                            
                                how to create groups with duplicate keys in pandas groupby? [duplicate]
                            
                                pipenv: How to resolve dependency conflicts
                            
                                pytest: source files and test files in a different directory [duplicate]
                            
                                Is "input" a keyword that causes errors when used as a parameter name (in PyTorch)?
                            
                                Permutations of a list where a condition is met?
                            
                                How to open the option items of a select tag (dropdown) in different tabs/windows?
                            
                                Optimal way to store data from Pandas to Snowflake
                            
                                Set the DATABASE_URL environment variable
                            
                                PATH not updated correctly from conda activate in VSCode's terminal
                            
                                Image deformations in TensorFlow
                            
                                Django Serve Files From External Storage Directly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scipy.curve_fit vs. numpy.polyfit different covariance matrices

Tags:

python

numpy

scipy

curve-fitting

user168000

People also ask

1 Answers

Daniel F

Recent Activity

Donate For Us