Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

If I try to run the script below I get the error: LinAlgError: SVD did not converge in Linear Least Squares. I have used the exact same script on a similar dataset and there it works. I have tried to search for values in my dataset that Python might interpret as a NaN but I cannot find anything.

My dataset is quite large and impossible to check by hand. (But I think my dataset is fine). I also checked the length of stageheight_masked and discharge_masked but they are the same. Does anyone know why there is an error in my script and what can I do about it?

import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval

kwargs = dict(delimiter = '\t',\
     skip_header = 0,\
     missing_values = 'NaN',\
     converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
     dtype = float,\
     names = True,\
     )

rating_curve_Gillisstraat = np.genfromtxt('G:\Discharge_and_stageheight_Gillisstraat.txt',**kwargs)

discharge = rating_curve_Gillisstraat['discharge']   #change names of collumns
stageheight = rating_curve_Gillisstraat['stage'] - 131.258

#mask NaN
discharge_masked = np.ma.masked_array(discharge,mask=np.isnan(discharge)).compressed()
stageheight_masked = np.ma.masked_array(stageheight,mask=np.isnan(discharge)).compressed()

#sort
sort_ind = np.argsort(stageheight_masked)
stageheight_masked = stageheight_masked[sort_ind]
discharge_masked = discharge_masked[sort_ind]

#regression
a1,b1,c1 = polyfit(stageheight_masked, discharge_masked, 2)
discharge_predicted = polyval([a1,b1,c1],stageheight_masked)

print 'regression coefficients'
print (a1,b1,c1)

#create upper and lower uncertainty
upper = discharge_predicted*1.15
lower = discharge_predicted*0.85

#create scatterplot

plt.scatter(stageheight,discharge,color='b',label='Rating curve')
plt.plot(stageheight_masked,discharge_predicted,'r-',label='regression line')
plt.plot(stageheight_masked,upper,'r--',label='15% error')
plt.plot(stageheight_masked,lower,'r--')
plt.axhline(y=1.6,xmin=0,xmax=1,color='black',label='measuring range')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,2)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()
like image 684
Toine Kerckhoffs Avatar asked Feb 23 '16 15:02

Toine Kerckhoffs


3 Answers

I don't have your data file, but it almost always that case that when you get that error you have NaN's or infinity in your data. Look for both of those using pd.notnull or np.isfinite

like image 123
ski_squaw Avatar answered Nov 15 '22 00:11

ski_squaw


As ski_squaw mentions the error is most of the time due to NaN's, however for me this error came after a windows update. I was using numpy version 1.16. Moving my numpy version to 1.19.3 solved the issue. (run pip install numpy==1.19.3 --user in the cmd)

This gitHub issue explains it more: https://github.com/numpy/numpy/issues/16744

Numpy 1.19.3 doesn't work on Linux and 1.19.4 doesn't work on Windows.

like image 39
Joris Avatar answered Nov 15 '22 00:11

Joris


As others have pointed out, the problem is likely that there are rows without numericals for the algorithm to work with. This is an issue with most regressions.

That's the problem. The solution then, is to do something about that. And that depends on the data. Often, you can replace the NaNs with 0s, using Pandas .fillna(0) for example. Sometimes, you might have to interpolate missing values, and Pandas .interpolate() is probably the simplest solution to that as well. Or, when it's not a time series, you might be able to simply drop the rows with NaNs in them, using for example Pandas .dropna() method. Or, sometimes it's not about the NaNs, but about the infs or others, and then there are other solutions for that: https://stackoverflow.com/a/55293137/12213843

Exactly which way to go about it, is up to the data. And it's up to you to interpret the data. And domain knowledge goes a long way to interpret the data well.

like image 28
Robin Avatar answered Nov 14 '22 23:11

Robin