Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy.curve_fit vs. numpy.polyfit different covariance matrices

I am using Python 3.6 for data fitting. Recently, I came across the following problem and I’m lacking experience wherefore I’m not sure how to deal with this.

If I use numpy.polyfit(x, y, 1, cov=True) and scipy.curve_fit(lambda: x, a, b: a*x+b, x, y) on the same set of data points, I get nearly the same coefficients a and b. But the values of the covariance matrix of scipy.curve_fit are roughly half of the values from numpy.polyfit.

Since I want to use the diagonal of the covariance matrix to estimate the uncertainties (u = numpy.sqrt(numpy.diag(cov))) of the coefficients, I have three questions:

  1. Which covariance matrix is the right one (Which one should I use)?
  2. Why is there a difference?
  3. What does it need to make them equal?

Thanks!

Edit:

import numpy as np
import scipy.optimize as sc

data = np.array([[1,2,3,4,5,6,7],[1.1,1.9,3.2,4.3,4.8,6.0,7.3]]).T

x=data[:,0]
y=data[:,1]

A=np.polyfit(x,y,1, cov=True)
print('Polyfit:', np.diag(A[1]))

B=sc.curve_fit(lambda x,a,b: a*x+b, x, y)
print('Curve_Fit:', np.diag(B[1]))

If I use the statsmodels.api, the result corresponds to that of curve_fit.

like image 589
user168000 Avatar asked Aug 23 '18 07:08

user168000


People also ask

What does SciPy optimize curve_fit do?

The Python SciPy Optimize Curve Fit function is widely used to obtain the best-fit parameters. The curve_fit() function is an optimization function that is used to find the optimized parameter set for a stated function that perfectly fits the provided data set.

What is curve_fit SciPy?

The SciPy API provides a 'curve_fit' function in its optimization library to fit the data with a given function. This method applies non-linear least squares to fit the data and extract the optimal parameters out of it.

What is POPT and PCOV?

1. What does popt and pcov mean? popt- An array of optimal values for the parameters which minimizes the sum of squares of residuals. pcov-2d array which contains the estimated covariance of popt. The diagonals provide the variance of the parameter estimate.

What is NumPy Polyfit?

Introduction to NumPy polyfit. In python, Numpy polyfit() is a method that fits the data within a polynomial function. That is, it least squares the function polynomial fit. For example, a polynomial p(X) of deg degree fits the coordinate points (X, Y).


1 Answers

I imagine it has something to do with this

593          # Some literature ignores the extra -2.0 factor in the denominator, but 
594          #  it is included here because the covariance of Multivariate Student-T 
595          #  (which is implied by a Bayesian uncertainty analysis) includes it. 
596          #  Plus, it gives a slightly more conservative estimate of uncertainty. 
597          if len(x) <= order + 2: 
598              raise ValueError("the number of data points must exceed order + 2 " 
599                               "for Bayesian estimate the covariance matrix") 
600          fac = resids / (len(x) - order - 2.0) 
601          if y.ndim == 1: 
602              return c, Vbase * fac 
603          else: 
604              return c, Vbase[:,:, NX.newaxis] * fac 

As in this case len(x) - order is 4 and (len(x) - order - 2.0) is 2, that would explain why your values are different by a factor of 2.

This explains question 2. The answer to question 3 is likely "get more data.", as for larger len(x) the difference will probably be negligible.

Which formulation is correct (question 1) is probably a question for Cross Validated, but I'd assume it is is curve_fit as that is explicitly intended to calculate the uncertainties as you state. From the documentation

pcov : 2d array

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).

While the comment in the code for polyfit above says its intetention is more for Student-T analysis.

like image 55
Daniel F Avatar answered Sep 27 '22 20:09

Daniel F