LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

Question

If I try to run the script below I get the error: LinAlgError: SVD did not converge in Linear Least Squares. I have used the exact same script on a similar dataset and there it works. I have tried to search for values in my dataset that Python might interpret as a NaN but I cannot find anything.

My dataset is quite large and impossible to check by hand. (But I think my dataset is fine). I also checked the length of stageheight_masked and discharge_masked but they are the same. Does anyone know why there is an error in my script and what can I do about it?

import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval

kwargs = dict(delimiter = '	',\
     skip_header = 0,\
     missing_values = 'NaN',\
     converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
     dtype = float,\
     names = True,\
     )

rating_curve_Gillisstraat = np.genfromtxt('G:\Discharge_and_stageheight_Gillisstraat.txt',**kwargs)

discharge = rating_curve_Gillisstraat['discharge']   #change names of collumns
stageheight = rating_curve_Gillisstraat['stage'] - 131.258

#mask NaN
discharge_masked = np.ma.masked_array(discharge,mask=np.isnan(discharge)).compressed()
stageheight_masked = np.ma.masked_array(stageheight,mask=np.isnan(discharge)).compressed()

#sort
sort_ind = np.argsort(stageheight_masked)
stageheight_masked = stageheight_masked[sort_ind]
discharge_masked = discharge_masked[sort_ind]

#regression
a1,b1,c1 = polyfit(stageheight_masked, discharge_masked, 2)
discharge_predicted = polyval([a1,b1,c1],stageheight_masked)

print 'regression coefficients'
print (a1,b1,c1)

#create upper and lower uncertainty
upper = discharge_predicted*1.15
lower = discharge_predicted*0.85

#create scatterplot

plt.scatter(stageheight,discharge,color='b',label='Rating curve')
plt.plot(stageheight_masked,discharge_predicted,'r-',label='regression line')
plt.plot(stageheight_masked,upper,'r--',label='15% error')
plt.plot(stageheight_masked,lower,'r--')
plt.axhline(y=1.6,xmin=0,xmax=1,color='black',label='measuring range')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,2)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()

ski_squaw · Accepted Answer

I don't have your data file, but it almost always that case that when you get that error you have NaN's or infinity in your data. Look for both of those using pd.notnull or np.isfinite

Joris · Answer

As ski_squaw mentions the error is most of the time due to NaN's, however for me this error came after a windows update. I was using numpy version 1.16. Moving my numpy version to 1.19.3 solved the issue. (run pip install numpy==1.19.3 --user in the cmd)

This gitHub issue explains it more: https://github.com/numpy/numpy/issues/16744

Numpy 1.19.3 doesn't work on Linux and 1.19.4 doesn't work on Windows.

This gitHub issue explains it more: https://github.com/numpy/numpy/issues/16744

Numpy 1.19.3 doesn't work on Linux and 1.19.4 doesn't work on Windows.

Robin · Answer

As others have pointed out, the problem is likely that there are rows without numericals for the algorithm to work with. This is an issue with most regressions.

That's the problem. The solution then, is to do something about that. And that depends on the data. Often, you can replace the NaNs with 0s, using Pandas .fillna(0) for example. Sometimes, you might have to interpolate missing values, and Pandas .interpolate() is probably the simplest solution to that as well. Or, when it's not a time series, you might be able to simply drop the rows with NaNs in them, using for example Pandas .dropna() method. Or, sometimes it's not about the NaNs, but about the infs or others, and then there are other solutions for that: https://stackoverflow.com/a/55293137/12213843

Exactly which way to go about it, is up to the data. And it's up to you to interpret the data. And domain knowledge goes a long way to interpret the data well.

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

Tags:

python

scipy

regression

Toine Kerckhoffs

3 Answers

ski_squaw

Joris

Robin

Recent Activity

Donate For Us

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

Tags:

python

scipy

regression

Toine Kerckhoffs

3 Answers

ski_squaw

Joris

Robin

Related questions

Recent Activity

Donate For Us