How to calculate the likelihood of curve-fitting in scipy?

1 Answers

Your likelihood function

enter image description here

which is simply the sum of log of probability density function of Gaussian distribution.

$enter image description here$

is the likelihood of fitting a mu and a sigma for your residue, not the likelihood of your model given your data. In one word, your approach is wrong.

Sine you are doing non-linear least square, following what @usethedeathstar already mentioned, you should go straight for F-test. . Consider the following example, modified from http://www.walkingrandomly.com/?p=5254, and we conduct F-test using R. And we will discuss how to translate it into python in the end.

# construct the data vectors using c()
> xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
> ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)
# some starting values
> p1 = 1
> p2 = 0.2
> p3 = 0.01

# do the fit
> fit1 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), start=list(p1=p1,p2=p2))
> fit2 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata)+p3*xdata, start=list(p1=p1,p2=p2,p3=p3))

# summarise
> summary(fit1)

Formula: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
p1 1.881851   0.027430   68.61 2.27e-12 ***
p2 0.700230   0.009153   76.51 9.50e-13 ***
---
Signif. codes:  0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1

Residual standard error: 0.08202 on 8 degrees of freedom

Number of iterations to convergence: 7 
Achieved convergence tolerance: 2.189e-06

> summary(fit2)

Formula: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata) + p3 * xdata

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
p1  1.90108    0.03520  54.002 1.96e-10 ***
p2  0.70657    0.01167  60.528 8.82e-11 ***
p3  0.02029    0.02166   0.937     0.38    
---
Signif. codes:  0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1

Residual standard error: 0.08243 on 7 degrees of freedom

Number of iterations to convergence: 9 
Achieved convergence tolerance: 2.476e-06

> anova(fit2, fit1)
Analysis of Variance Table

Model 1: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata) + p3 * xdata
Model 2: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata)
  Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
1      7   0.047565                             
2      8   0.053813 -1 -0.0062473  0.9194 0.3696

here we have two model, fit1 has 2 parameters, therefore the residue has 8 degrees-of-freedom; fit2 has one additional parameter and the residue has 7 degrees of freedom. Is model 2 significantly better? No, the F value is 0.9194, on (1,7) degrees of freedom and it is not significant.

To get the ANOVA table: Residue DF is easy. Residue Sum of squares: 0.08202*0.08202*8=0.05381 and 0.08243*0.08243*7=0.04756293 (notice: 'Residual standard error: 0.08243 on 7 degrees of freedom', etc). In python, you can get it by (y_observed-y_fitted)**2, since scipy.optimize.curve_fit() doesn't return the residues.

The F-ratio is 0.0062473/0.047565*7 and to get P-value: 1-scipy.stats.f.cdf(0.9194, 1, 7).

Put them together we have python equivalent:

In [1]:

import scipy.optimize as so
import scipy.stats as ss
xdata = np.array([-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9])
ydata = np.array([0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001])
def model0(x,p1,p2):
    return p1*np.cos(p2*x) + p2*np.sin(p1*x)
def model1(x,p1,p2,p3):
    return p1*np.cos(p2*x) + p2*np.sin(p1*x)+p3*x
p1, p2, p3 = 1, 0.2, 0.01
fit0=so.curve_fit(model0, xdata, ydata, p0=(p1,p2))[0]
fit1=so.curve_fit(model1, xdata, ydata, p0=(p1,p2,p3))[0]
yfit0=model0(xdata, fit0[0], fit0[1])
yfit1=model1(xdata, fit1[0], fit1[1], fit1[2])
ssq0=((yfit0-ydata)**2).sum()
ssq1=((yfit1-ydata)**2).sum()
df=len(xdata)-3
f_ratio=(ssq0-ssq1)/(ssq1/df)
p=1-ss.f.cdf(f_ratio, 1, df)
In [2]:

print f_ratio, p
0.919387419515 0.369574503394

As @usethedeathstar pointed out: when you the residue is normally distributed, nonlinear least square IS the maximum likelihood. Therefore F-test and likelihood ratio test is equivalent. Because, F-ratio is a monotone transformation of the likelihood ratio λ.

Or in a descriptive way, see: http://www.stata.com/support/faqs/statistics/chi-squared-and-f-distributions/

answered Nov 15 '22 02:11

CT Zhu

Related questions
                            
                                Native bridge between Python and Dalvik or AAF
                            
                                Python not closing file descriptors
                            
                                Python Joining csv files where key is first column value
                            
                                looping through loops in python?
                            
                                Adding annotation to data points
                            
                                Pypi: can I claim to be the new maintainer of an unmaintained package?
                            
                                shutil.rmtree to remove files only?
                            
                                Avoiding unnecessary key evaluations when sorting a list
                            
                                Pandas: plot multiple columns to same x value
                            
                                Mock - testing if a method is called without specifying arguments
                            
                                Python packages duplicates installed with pip and conda
                            
                                Is there a way with biopython to obtain the full abstract from a pubmed article?
                            
                                how to create a dictionary of pandas dataframes, and return the dataframes into excel worksheets?
                            
                                Using scipy.spatial.Delaunay in place of matplotlib.tri.Triangulation's built-in version
                            
                                Meaning of $ in a string?
                            
                                How to disable hostname checking in requests python
                            
                                autoenv executes even in subfolder
                            
                                determining test coverage with py.test + gevent
                            
                                Meaning of error numbers in Python exceptions
                            
                                Problems of installing scikit-learn in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate the likelihood of curve-fitting in scipy?

Tags:

python

numpy

scipy

statsmodels

Yuxiang Wang

People also ask

1 Answers

CT Zhu

Recent Activity

Donate For Us