How to fix "polyfit maybe poorly conditioned" in numpy?

Tags:

numpy

I am trying to do a polyfit on a set of data using numpy package.

The following is the code, it can run successfully. The fitted line seems to fit the data when the order reaches around 20(very high). However, at the end, it says "Polyfit may be poorly conditioned".

If I am not wrong, it is that when the degree gets higher, the fitting will be sensitive to the data, i.e. easily influenced by the data? How can I fix this?

def gen_data_9(length=5000):
x = 2.0 * (np.random.rand(length) - 0.5) * np.pi * 2.0
f = lambda x: np.exp(-x**2) * (-x) * 5 + x / 3
y = f(x) + np.random.randn(len(x)) * 0.5
return x, y, f

fig,ax = plt.subplots(3,3,figsize = (16,16))

for n in range(3):
    for k in range(3):
    
        order = 20*n+10*k+1
        z = np.polyfit(x,y,order)
        p = np.poly1d(z)

        ax[n,k].scatter(x,y,label = "Real data",s=1)
        ax[n,k].scatter(x,p(x),label = "Polynomial with order={}".format(order),
                    color='C1',s=1)
    ax[n,k].legend()

fig.show()

433

asked Apr 18 '18 14:04

Airpen101

1 Answers

TL;DR: In this case the warning means: use a lower order!

To quote the documentation:

Note that fitting polynomial coefficients is inherently badly conditioned when the degree of the polynomial is large or the interval of sample points is badly centered. The quality of the fit should always be checked in these cases. When polynomial fits are not satisfactory, splines may be a good alternative.

In other words, the warning tells you to double-check the results. If they seem fine don't worry. But are they fine? To answer that you should evaluate the resulting fit not only on the data points used for fitting (these often match rather well, especially when overfitting). Consider this:

xp = np.linspace(-1, 1, 10000) * 2 * np.pi

for n in range(3):
    for k in range(3):

        order = 20*n+10*k+1
        print(order)
        z = np.polyfit(x,y,order)
        p = np.poly1d(z)

        ax[n,k].scatter(x,y,label = "Real data",s=1)
        ax[n,k].plot(xp,p(xp),label = "Polynomial with order={}".format(order), color='C1')
        ax[n,k].legend()

Here we evaluate the polyfit on points spaced much more finely than the sample data. This is the result:

enter image description here

You can see that for orders 40 and above the results really shoot off. This coincides with the warnings I get.

167

answered Nov 06 '22 22:11

MB-F

Related questions
                            
                                Get list of query results in Peewee
                            
                                How to delete a specific message by ID using discord.py
                            
                                Use hidden states instead of outputs in LSTMs of keras
                            
                                TensorFlow - object detection module, error appear when trying to use protoc
                            
                                Three sum algorithm solution
                            
                                Pandas select n middle rows
                            
                                How to create new values in a pandas dataframe column based on values from another column
                            
                                dask.multiprocessing or pandas + multiprocessing.pool: what's the difference?
                            
                                get feature names of SelectKBest function python
                            
                                How to prevent cached response (flask server, using chrome)
                            
                                google.api_core.exceptions.Forbidden: 403 Missing or insufficient permissions
                            
                                Pandas split and select the second element
                            
                                Print 2 lists side by side
                            
                                No module named 'bokeh.plotting'; bokeh is not a package
                            
                                How do I mock class instance attributes?
                            
                                Pandas create date range at certain dates
                            
                                Python Script to Convert CSV to GeoJSON
                            
                                NLTK. Detecting whether a sentence is Interogative or Not?
                            
                                How to install tesseract for python on anaconda
                            
                                Using for loop to define multiple functions - Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With