Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()

For some reason I cannot get this block of code to run properly anymore:

import numpy as np
from sklearn.linear_model import LinearRegression

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

I'm not sure why I'm getting this error on such a simple example. Here are my current versions:

scipy.__version__
'1.5.0'
sklearn.__version__
'0.23.1'

I'm running this on 64-bit Windows 10 Enterprise and Python 3.7.3. I've tried uninstalling and reinstalling scipy and scikit-learn. I've tried earlier version of scipy. I've tried uninstalling and reinstalling Python and none of these solved the issue.

Update: So it appears to be tied to matplotlib too. I was running this example previously in Pycharm, but I've moved to running it directly from the PowerShell. So if I run this code outside of Pycharm I do not get an error

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

However if I plot the data during it I get an error:

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Plot data
plt.scatter(x, y)
plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
 ** On entry to DLASCLS parameter number  4 had an illegal value
Traceback (most recent call last):
  File ".\run.py", line 18, in <module>
    lm.fit(x.reshape(-1, 1), y)
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

But if I comment out the line plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red') it works fine.

like image 905
evan.tuck Avatar asked Jun 24 '20 18:06

evan.tuck


3 Answers

It seems it only happens when you print the figure using matplotlib, else you can run the fit algorithm as many times as you like.

However if you change the data type from float64 to float32 (Grzesik answer), strangely enough the error disappears. Feels like a bug to me Why would changing the data type affect the interaction between matplotlib and the lapack_function within sklearn?

More a question than an answer, but it is a bit scary to find these unexpected interactions across functions and data types.

import numpy as np
import sklearn
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


def main(print_matplotlib=False,dtype=np.float64):
    x = np.linspace(-3,3,100).astype(dtype)
    print(x.dtype)
    y = 2*np.random.rand(x.shape[0])*x + np.random.rand(x.shape[0])
    x = x.reshape((-1,1))

    reg=LinearRegression().fit(x,y)
    print(reg.intercept_,reg.coef_)
    
    yh = reg.predict(x)
    
    if print_matplotlib:
        plt.scatter(x,y)
        plt.plot(x,yh)
        plt.show()


No plotting

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)  
    pass

float64
0.5957165420019624 [0.91960601]
float64
0.5957165420019624 [0.91960601]

Plotting dtype = np.float64

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    pass

float64
0.5957165420019624 [0.91960601]

Plot 1

float64
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-52593a548324> in <module>
      3     main(print_matplotlib = True)
      4     np.random.seed(64)
----> 5     main(print_matplotlib = True)
      6 
      7     pass

<ipython-input-1-11139051f2d3> in main(print_matplotlib, dtype)
     11     x = x.reshape((-1,1))
     12 
---> 13     reg=LinearRegression().fit(x,y)
     14     print(reg.intercept_,reg.coef_)
     15 

~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
    545         else:
    546             self.coef_, self._residues, self.rank_, self.singular_ = \
--> 547                 linalg.lstsq(X, y)
    548             self.coef_ = self.coef_.T
    549 

~\AppData\Roaming\Python\Python37\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver)
   1249         if info < 0:
   1250             raise ValueError('illegal value in %d-th argument of internal %s'
-> 1251                              % (-info, lapack_driver))
   1252         resids = np.asarray([], dtype=x.dtype)
   1253         if m > n:

ValueError: illegal value in 4-th argument of internal None

Plotting dtype=np.float32

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    pass

Output 2

like image 100
Alberto GR Avatar answered Oct 17 '22 10:10

Alberto GR


As of numpy 1.19.1 and sklearn v0.23.2, I found that polyfit(deg=1) and LinearRegression().fit() gave unexpected errors without any good reason. No, data didn't have any NaN or Inf value. I eventually used scipy.stats.linregress().

slope, intercept, r_value, p_value, std_err = stats.linregress(x.astype(np.float32), y.astype(np.float32))
like image 34
Tae-Sung Shin Avatar answered Oct 17 '22 10:10

Tae-Sung Shin


First check for nan,inf values. and also try normalize=True

lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit() 

But these didn't work for me. Also, my data didn't have any nan or inf values. But while experimenting, I found that running the same code second time works. hence I did this

try: 
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
except:
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()

I don't know why this work, but this solved the problem for me. So trying to run the same code twice did the trick for me.

like image 42
Suraj Mahangade Avatar answered Oct 17 '22 10:10

Suraj Mahangade