For some reason I cannot get this block of code to run properly anymore:
import numpy as np
from sklearn.linear_model import LinearRegression
# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))
# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
Traceback (most recent call last):
File "<input>", line 2, in <module>
File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
linalg.lstsq(X, y)
File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
% (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None
I'm not sure why I'm getting this error on such a simple example. Here are my current versions:
scipy.__version__
'1.5.0'
sklearn.__version__
'0.23.1'
I'm running this on 64-bit Windows 10 Enterprise and Python 3.7.3. I've tried uninstalling and reinstalling scipy and scikit-learn. I've tried earlier version of scipy. I've tried uninstalling and reinstalling Python and none of these solved the issue.
Update: So it appears to be tied to matplotlib too. I was running this example previously in Pycharm, but I've moved to running it directly from the PowerShell. So if I run this code outside of Pycharm I do not get an error
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))
# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
However if I plot the data during it I get an error:
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))
# Plot data
plt.scatter(x, y)
plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')
# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
** On entry to DLASCLS parameter number 4 had an illegal value
Traceback (most recent call last):
File ".\run.py", line 18, in <module>
lm.fit(x.reshape(-1, 1), y)
File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
linalg.lstsq(X, y)
File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
% (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None
But if I comment out the line plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')
it works fine.
It seems it only happens when you print the figure using matplotlib, else you can run the fit algorithm as many times as you like.
However if you change the data type from float64 to float32 (Grzesik answer), strangely enough the error disappears. Feels like a bug to me Why would changing the data type affect the interaction between matplotlib and the lapack_function within sklearn?
More a question than an answer, but it is a bit scary to find these unexpected interactions across functions and data types.
import numpy as np
import sklearn
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
def main(print_matplotlib=False,dtype=np.float64):
x = np.linspace(-3,3,100).astype(dtype)
print(x.dtype)
y = 2*np.random.rand(x.shape[0])*x + np.random.rand(x.shape[0])
x = x.reshape((-1,1))
reg=LinearRegression().fit(x,y)
print(reg.intercept_,reg.coef_)
yh = reg.predict(x)
if print_matplotlib:
plt.scatter(x,y)
plt.plot(x,yh)
plt.show()
No plotting
if __name__ == "__main__":
np.random.seed(64)
main(print_matplotlib = False, dtype=np.float64)
np.random.seed(64)
main(print_matplotlib = False, dtype=np.float64)
pass
float64
0.5957165420019624 [0.91960601]
float64
0.5957165420019624 [0.91960601]
Plotting dtype = np.float64
if __name__ == "__main__":
np.random.seed(64)
main(print_matplotlib = True, dtype=np.float64)
np.random.seed(64)
main(print_matplotlib = True, dtype=np.float64)
pass
float64
0.5957165420019624 [0.91960601]
Plot 1
float64
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-52593a548324> in <module>
3 main(print_matplotlib = True)
4 np.random.seed(64)
----> 5 main(print_matplotlib = True)
6
7 pass
<ipython-input-1-11139051f2d3> in main(print_matplotlib, dtype)
11 x = x.reshape((-1,1))
12
---> 13 reg=LinearRegression().fit(x,y)
14 print(reg.intercept_,reg.coef_)
15
~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
545 else:
546 self.coef_, self._residues, self.rank_, self.singular_ = \
--> 547 linalg.lstsq(X, y)
548 self.coef_ = self.coef_.T
549
~\AppData\Roaming\Python\Python37\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver)
1249 if info < 0:
1250 raise ValueError('illegal value in %d-th argument of internal %s'
-> 1251 % (-info, lapack_driver))
1252 resids = np.asarray([], dtype=x.dtype)
1253 if m > n:
ValueError: illegal value in 4-th argument of internal None
Plotting dtype=np.float32
if __name__ == "__main__":
np.random.seed(64)
main(print_matplotlib = True, dtype=np.float32)
np.random.seed(64)
main(print_matplotlib = True, dtype=np.float32)
pass
Output 2
As of numpy 1.19.1 and sklearn v0.23.2, I found that polyfit(deg=1) and LinearRegression().fit() gave unexpected errors without any good reason. No, data didn't have any NaN or Inf value. I eventually used scipy.stats.linregress().
slope, intercept, r_value, p_value, std_err = stats.linregress(x.astype(np.float32), y.astype(np.float32))
First check for nan,inf values. and also try normalize=True
lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
But these didn't work for me. Also, my data didn't have any nan or inf values. But while experimenting, I found that running the same code second time works. hence I did this
try:
lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
except:
lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
I don't know why this work, but this solved the problem for me. So trying to run the same code twice did the trick for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With