Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SelectKBest for regression gives "unknown label type"-error

I am trying to get a slightly modified version of the SelectKBest example to work but keep getting a ValueError("Unknown label type: %s" % repr(ys))

Here's my code:

# Importing dependencies
import numpy as np
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.datasets import load_iris

#The Example from:
#http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection
iris = load_iris()
X, Y = iris.data, iris.target
print(X.shape, type(X), type(X[0,0]))
print(Y.shape, type(Y), type(Y[0]))
X_new = SelectKBest(chi2, k=2).fit_transform(X, Y)

#My toyproblem:
X = np.random.uniform(0,1, size=(5000, 10))
Y = np.random.uniform(0,1, size=(5000,))

#Type cast which might solve my problem by thi suggestion:
# https://stackoverflow.com/questions/45346550/valueerror-unknown-label-type-unknown
X=X.astype('float')
Y=Y.astype('float')

print(X.shape, type(X), type(X[0,0]))
print(Y.shape, type(Y), type(Y[0]))

X_new = SelectKBest(chi2, k=2).fit_transform(X, Y)

The input data is shaped in exactly the same way, also input data types are almost the same:

(150, 4) <class 'numpy.ndarray'> <class 'numpy.float64'>
(150,) <class 'numpy.ndarray'> <class 'numpy.int32'>
(5000, 10) <class 'numpy.ndarray'> <class 'numpy.float64'>
(5000,) <class 'numpy.ndarray'> <class 'numpy.float64'>

However my code crashes throwing the mentioned ValueError. Reading the SelectKBest Doc, the switch from classification to regression shouldn't really be a problem. Can someone help me find out what's going wrong?

Traceback (most recent call last):
  File "featureSelection_toyproblem.py", line 26, in <module>
    X_new = SelectKBest(chi2, k=2).fit_transform(X, Y)
  File "C:\Users\mobrecht\AppData\Local\Continuum\anaconda3\envs\env_zipline\lib
\site-packages\sklearn\base.py", line 520, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "C:\Users\mobrecht\AppData\Local\Continuum\anaconda3\envs\env_zipline\lib
\site-packages\sklearn\feature_selection\univariate_selection.py", line 349, in
fit
    score_func_ret = self.score_func(X, y)
  File "C:\Users\mobrecht\AppData\Local\Continuum\anaconda3\envs\env_zipline\lib
\site-packages\sklearn\feature_selection\univariate_selection.py", line 217, in
chi2
    Y = LabelBinarizer().fit_transform(y)
  File "C:\Users\mobrecht\AppData\Local\Continuum\anaconda3\envs\env_zipline\lib
\site-packages\sklearn\preprocessing\label.py", line 307, in fit_transform
    return self.fit(y).transform(y)
  File "C:\Users\mobrecht\AppData\Local\Continuum\anaconda3\envs\env_zipline\lib
\site-packages\sklearn\preprocessing\label.py", line 284, in fit
    self.classes_ = unique_labels(y)
  File "C:\Users\mobrecht\AppData\Local\Continuum\anaconda3\envs\env_zipline\lib
\site-packages\sklearn\utils\multiclass.py", line 97, in unique_labels
    raise ValueError("Unknown label type: %s" % repr(ys))
ValueError: Unknown label type: (array([ 0.42595241,  0.79859991,  0.22947246, .
..,  0.86011766,
        0.52335991,  0.27046173]),)
like image 842
Mischa Obrecht Avatar asked Dec 13 '22 16:12

Mischa Obrecht


1 Answers

Check the chi2 docs here, it can only be used for classification. Your toy problem targets Y are real values not labels.

like image 52
Jan K Avatar answered Mar 18 '23 05:03

Jan K