Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the error of numpy.polyfit?

I want to use numpy.polyfit for physical calculations, therefore I need the magnitude of the error.

like image 355
varantir Avatar asked Mar 30 '13 16:03

varantir


People also ask

What is Numpy Polyfit?

Introduction to NumPy polyfit. In python, Numpy polyfit() is a method that fits the data within a polynomial function. That is, it least squares the function polynomial fit. For example, a polynomial p(X) of deg degree fits the coordinate points (X, Y).

What is Numpy Polyfit return?

The np. polyfit() method takes a few parameters and returns a vector of coefficients p that minimizes the squared error in the order deg, deg-1, … 0. It least squares the polynomial fit.

How does Python Polyfit work?

polyfit() helps us by finding the least square polynomial fit. This means finding the best fitting curve to a given set of points by minimizing the sum of squares. It takes 3 different inputs from the user, namely X, Y, and the polynomial degree. Here X and Y represent the values that we want to fit on the 2 axes.

How do you fit a polynomial to data in Python?

To get the least-squares fit of a polynomial to data, use the polynomial. polyfit() in Python Numpy. The method returns the Polynomial coefficients ordered from low to high. If y was 2-D, the coefficients in column k of coef represent the polynomial fit to the data in y's k-th column.


2 Answers

If you specify full=True in your call to polyfit, it will include extra information:

>>> x = np.arange(100) >>> y = x**2 + 3*x + 5 + np.random.rand(100) >>> np.polyfit(x, y, 2) array([ 0.99995888,  3.00221219,  5.56776641]) >>> np.polyfit(x, y, 2, full=True) (array([ 0.99995888,  3.00221219,  5.56776641]), # coefficients  array([ 7.19260721]), # residuals  3, # rank  array([ 11.87708199,   3.5299267 ,   0.52876389]), # singular values  2.2204460492503131e-14) # conditioning threshold 

The residual value returned is the sum of the squares of the fit errors, not sure if this is what you are after:

>>> np.sum((np.polyval(np.polyfit(x, y, 2), x) - y)**2) 7.1926072073491056 

In version 1.7 there is also a cov keyword that will return the covariance matrix for your coefficients, which you could use to calculate the uncertainty of the fit coefficients themselves.

like image 129
Jaime Avatar answered Oct 09 '22 02:10

Jaime


As you can see in the documentation:

Returns ------- p : ndarray, shape (M,) or (M, K)     Polynomial coefficients, highest power first.     If `y` was 2-D, the coefficients for `k`-th data set are in ``p[:,k]``.  residuals, rank, singular_values, rcond : present only if `full` = True     Residuals of the least-squares fit, the effective rank of the scaled     Vandermonde coefficient matrix, its singular values, and the specified     value of `rcond`. For more details, see `linalg.lstsq`. 

Which means that if you can do a fit and get the residuals as:

 import numpy as np  x = np.arange(10)  y = x**2 -3*x + np.random.random(10)   p, res, _, _, _ = numpy.polyfit(x, y, deg, full=True) 

Then, the p are your fit parameters, and the res will be the residuals, as described above. The _'s are because you don't need to save the last three parameters, so you can just save them in the variable _ which you won't use. This is a convention and is not required.


@Jaime's answer explains what the residual means. Another thing you can do is look at those squared deviations as a function (the sum of which is res). This is particularly helpful to see a trend that didn't fit sufficiently. res can be large because of statistical noise, or possibly systematic poor fitting, for example:

x = np.arange(100) y = 1000*np.sqrt(x) + x**2 - 10*x + 500*np.random.random(100) - 250  p = np.polyfit(x,y,2) # insufficient degree to include sqrt  yfit = np.polyval(p,x)  figure() plot(x,y, label='data') plot(x,yfit, label='fit') plot(x,yfit-y, label='var') 

So in the figure, note the bad fit near x = 0:
polyfit

like image 26
askewchan Avatar answered Oct 09 '22 02:10

askewchan