Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.polyfit versus scipy.odr

I have a data set which in theory is described by a polynomial of the second degree. I would like to fit this data and I have used numpy.polyfit to do this. However, the down side is that the error on the returned coefficients is not available. Therefore I decided to also fit the data using scipy.odr. The weird thing was that the coefficients for the polynomial deviated from each other.

I do not understand this and therefore decided to test both fitting routines on a set of data that I produce my self:

import numpy
import scipy.odr
import matplotlib.pyplot as plt

x = numpy.arange(-20, 20, 0.1)
y = 1.8 * x**2 -2.1 * x + 0.6 + numpy.random.normal(scale = 100, size = len(x))

#Define function for scipy.odr
def fit_func(p, t):
  return p[0] * t**2 + p[1] * t + p[2]

#Fit the data using numpy.polyfit
fit_np = numpy.polyfit(x, y, 2)

#Fit the data using scipy.odr
Model = scipy.odr.Model(fit_func)
Data = scipy.odr.RealData(x, y)
Odr = scipy.odr.ODR(Data, Model, [1.5, -2, 1], maxit = 10000)
output = Odr.run()
#output.pprint()
beta = output.beta
betastd = output.sd_beta

print "poly", fit_np
print "ODR", beta

plt.plot(x, y, "bo")
plt.plot(x, numpy.polyval(fit_np, x), "r--", lw = 2)
plt.plot(x, fit_func(beta, x), "g--", lw = 2)

plt.tight_layout()

plt.show()

An example of an outcome is as follows:

poly [ 1.77992643 -2.42753714  3.86331152]
ODR [   3.8161735   -23.08952492 -146.76214989]

enter image description here

In the included image, the solution of numpy.polyfit (red dashed line) corresponds pretty well. The solution of scipy.odr (green dashed line) is basically completely off. I do have to note that the difference between numpy.polyfit and scipy.odr was less in the actual data set I wanted to fit. However, I do not understand where the difference between the two comes from, why in my own testing example the difference is extremely big, and which fitting routine is better?

I hope you can provide answers that might help me give a better understanding between the two fitting routines and in the process provide answers to the questions I have.

like image 312
The Dude Avatar asked Mar 19 '23 00:03

The Dude


1 Answers

In the way you are using ODR it does a full orthogonal distance regression. To have it do a normal nonlinear least squares fit add

Odr.set_job(fit_type=2)

before starting the optimization and you will get what you expected.

fit result

The reason that the full ODR fails so badly is due to not specifying weights/standard deviations. Obviously it does hard to interpret that point cloud and assumes equal wheights for x and y. If you provide estimated standard deviations, odr will yield a good (though different of course) result, too.

Data = scipy.odr.RealData(x, y, sx=0.1, sy=10)
like image 54
Christian K. Avatar answered Mar 29 '23 16:03

Christian K.