Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between fitting algorithms in scipy

I have a question about the fit algorithms used in scipy. In my program, I have a set of x and y data points with y errors only, and want to fit a function

f(x) = (a[0] - a[1])/(1+np.exp(x-a[2])/a[3]) + a[1]

to it.

The problem is that I get absurdly high errors on the parameters and also different values and errors for the fit parameters using the two fit scipy fit routines scipy.odr.ODR (with least squares algorithm) and scipy.optimize. I'll give my example:

Fit with scipy.odr.ODR, fit_type=2

Beta: [ 11.96765963 68.98892582 100.20926023 0.60793377]
Beta Std Error: [ 4.67560801e-01 3.37133614e+00 8.06031988e+04 4.90014367e+04]
Beta Covariance: [[ 3.49790629e-02 1.14441187e-02 -1.92963671e+02 1.17312104e+02]
[ 1.14441187e-02 1.81859542e+00 -5.93424196e+03 3.60765567e+03]
[ -1.92963671e+02 -5.93424196e+03 1.03952883e+09 -6.31965068e+08]
[ 1.17312104e+02 3.60765567e+03 -6.31965068e+08 3.84193143e+08]]
Residual Variance: 6.24982731975
Inverse Condition #: 1.61472215874e-08
Reason(s) for Halting:
Sum of squares convergence

and then the fit with scipy.optimize.leastsquares:

Fit with scipy.optimize.leastsq

beta: [ 11.9671859 68.98445306 99.43252045 1.32131099]
Beta Std Error: [0.195503 1.384838 34.891521 45.950556]
Beta Covariance: [[ 3.82214235e-02 -1.05423284e-02 -1.99742825e+00 2.63681933e+00]
[ -1.05423284e-02 1.91777505e+00 1.27300761e+01 -1.67054172e+01]
[ -1.99742825e+00 1.27300761e+01 1.21741826e+03 -1.60328181e+03]
[ 2.63681933e+00 -1.67054172e+01 -1.60328181e+03 2.11145361e+03]]
Residual Variance: 6.24982904455 (calulated by me)

My Point is the third fit parameter: The results are

scipy.odr.ODR, fit_type=2: C = 100.209 +/- 80600

scipy.optimize.leastsq: C = 99.432 +/- 12.730

I don't know why the first error is so much higher. Even better: If I put exactly the same data points with errors into Origin 9 I get C = x0 = 99,41849 +/- 0,20283

and again exactly the same data into c++ ROOT Cern C = 99.85+/- 1.373

even though I used exactly the same initial variables for ROOT and Python. Origin doesn't need any.

Do you have any clue why this happens and which is the best result?

I added the code for you at pastebin:

  • Data
  • C++ code
  • Python code: http://pastebin.com/jZVyzMkS

Thank you for helping!

EDIT: here's the plot related to SirJohnFranklins post: see comment below

like image 308
Captain Sandwich Avatar asked Nov 13 '22 02:11

Captain Sandwich


1 Answers

Did you actually try plotting the ODR and leastsq fits side by side? They look basically identical:

enter image description here

Consider what the parameters correspond to - the step function described by beta[0] and beta[1], the initial and final values, explains by far the majority of the variance in your data. By contrast, small changes in beta[2] and beta[3], the inflexion point and slope, will have comparatively little effect on the overall shape of the curve and therefore the residual variance for the fit. It's therefore no surprise that these parameters have high standard errors, and are fitted slightly differently by the two algorithms.

The overall greater standard errors reported by ODR are due to the fact that this model incorporates errors in the y-values whereas the ordinary least squares fit does not - errors in the measured y-values ought to reduce our confidence in the estimated fit parameters.

like image 152
ali_m Avatar answered Nov 15 '22 11:11

ali_m