I am trying to do a polyfit on a set of data using numpy package.
The following is the code, it can run successfully. The fitted line seems to fit the data when the order reaches around 20(very high). However, at the end, it says "Polyfit may be poorly conditioned".
If I am not wrong, it is that when the degree gets higher, the fitting will be sensitive to the data, i.e. easily influenced by the data? How can I fix this?
def gen_data_9(length=5000):
x = 2.0 * (np.random.rand(length) - 0.5) * np.pi * 2.0
f = lambda x: np.exp(-x**2) * (-x) * 5 + x / 3
y = f(x) + np.random.randn(len(x)) * 0.5
return x, y, f
fig,ax = plt.subplots(3,3,figsize = (16,16))
for n in range(3):
for k in range(3):
order = 20*n+10*k+1
z = np.polyfit(x,y,order)
p = np.poly1d(z)
ax[n,k].scatter(x,y,label = "Real data",s=1)
ax[n,k].scatter(x,p(x),label = "Polynomial with order={}".format(order),
color='C1',s=1)
ax[n,k].legend()
fig.show()
In python, Numpy polyfit() is a method that fits the data within a polynomial function. That is, it least squares the function polynomial fit. For example, a polynomial p(X) of deg degree fits the coordinate points (X, Y). This function returns a coefficient vector p that lessens the squared error in the deg, deg-1,…
The np. polyfit() method takes a few parameters and returns a vector of coefficients p that minimizes the squared error in the order deg, deg-1, … 0. It least squares the polynomial fit.
polyfit() helps us by finding the least square polynomial fit. This means finding the best fitting curve to a given set of points by minimizing the sum of squares. It takes 3 different inputs from the user, namely X, Y, and the polynomial degree. Here X and Y represent the values that we want to fit on the 2 axes.
TL;DR: In this case the warning means: use a lower order!
To quote the documentation:
Note that fitting polynomial coefficients is inherently badly conditioned when the degree of the polynomial is large or the interval of sample points is badly centered. The quality of the fit should always be checked in these cases. When polynomial fits are not satisfactory, splines may be a good alternative.
In other words, the warning tells you to double-check the results. If they seem fine don't worry. But are they fine? To answer that you should evaluate the resulting fit not only on the data points used for fitting (these often match rather well, especially when overfitting). Consider this:
xp = np.linspace(-1, 1, 10000) * 2 * np.pi
for n in range(3):
for k in range(3):
order = 20*n+10*k+1
print(order)
z = np.polyfit(x,y,order)
p = np.poly1d(z)
ax[n,k].scatter(x,y,label = "Real data",s=1)
ax[n,k].plot(xp,p(xp),label = "Polynomial with order={}".format(order), color='C1')
ax[n,k].legend()
Here we evaluate the polyfit on points spaced much more finely than the sample data. This is the result:
You can see that for orders 40 and above the results really shoot off. This coincides with the warnings I get.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With