Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix "polyfit maybe poorly conditioned" in numpy?

Tags:

python

numpy

I am trying to do a polyfit on a set of data using numpy package.

The following is the code, it can run successfully. The fitted line seems to fit the data when the order reaches around 20(very high). However, at the end, it says "Polyfit may be poorly conditioned".

If I am not wrong, it is that when the degree gets higher, the fitting will be sensitive to the data, i.e. easily influenced by the data? How can I fix this?

def gen_data_9(length=5000):
x = 2.0 * (np.random.rand(length) - 0.5) * np.pi * 2.0
f = lambda x: np.exp(-x**2) * (-x) * 5 + x / 3
y = f(x) + np.random.randn(len(x)) * 0.5
return x, y, f

fig,ax = plt.subplots(3,3,figsize = (16,16))

for n in range(3):
    for k in range(3):
    
        order = 20*n+10*k+1
        z = np.polyfit(x,y,order)
        p = np.poly1d(z)

        ax[n,k].scatter(x,y,label = "Real data",s=1)
        ax[n,k].scatter(x,p(x),label = "Polynomial with order={}".format(order),
                    color='C1',s=1)
    ax[n,k].legend()

fig.show()
like image 433
Airpen101 Avatar asked Apr 18 '18 14:04

Airpen101


People also ask

What is Polyfit Numpy?

In python, Numpy polyfit() is a method that fits the data within a polynomial function. That is, it least squares the function polynomial fit. For example, a polynomial p(X) of deg degree fits the coordinate points (X, Y). This function returns a coefficient vector p that lessens the squared error in the deg, deg-1,…

What is Numpy Polyfit return?

The np. polyfit() method takes a few parameters and returns a vector of coefficients p that minimizes the squared error in the order deg, deg-1, … 0. It least squares the polynomial fit.

How does Python Polyfit work?

polyfit() helps us by finding the least square polynomial fit. This means finding the best fitting curve to a given set of points by minimizing the sum of squares. It takes 3 different inputs from the user, namely X, Y, and the polynomial degree. Here X and Y represent the values that we want to fit on the 2 axes.


1 Answers

TL;DR: In this case the warning means: use a lower order!

To quote the documentation:

Note that fitting polynomial coefficients is inherently badly conditioned when the degree of the polynomial is large or the interval of sample points is badly centered. The quality of the fit should always be checked in these cases. When polynomial fits are not satisfactory, splines may be a good alternative.

In other words, the warning tells you to double-check the results. If they seem fine don't worry. But are they fine? To answer that you should evaluate the resulting fit not only on the data points used for fitting (these often match rather well, especially when overfitting). Consider this:

xp = np.linspace(-1, 1, 10000) * 2 * np.pi

for n in range(3):
    for k in range(3):

        order = 20*n+10*k+1
        print(order)
        z = np.polyfit(x,y,order)
        p = np.poly1d(z)

        ax[n,k].scatter(x,y,label = "Real data",s=1)
        ax[n,k].plot(xp,p(xp),label = "Polynomial with order={}".format(order), color='C1')
        ax[n,k].legend()

Here we evaluate the polyfit on points spaced much more finely than the sample data. This is the result:

enter image description here

You can see that for orders 40 and above the results really shoot off. This coincides with the warnings I get.

like image 167
MB-F Avatar answered Nov 06 '22 22:11

MB-F