According to the docs, the OneVsRest classifier supports multilabel classification: http://scikit-learn.org/stable/modules/multiclass.html#multilabel-learning
Here's the code I'm trying to run:
from sklearn import metrics
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC
x = [[1,2,3],[3,3,2],[8,8,7],[3,7,1],[4,5,6]]
y = [['bar','foo'],['bar'],['foo'],['foo','jump'],['bar','fox','jump']]
y_enc = MultiLabelBinarizer().fit_transform(y)
train_x, train_y, test_x, test_y = train_test_split(x, y_enc, test_size=0.33)
clf = OneVsRestClassifier(SVC())
clf.fit(train_x, train_y)
predictions = clf.predict_proba(test_x)
my_metrics = metrics.classification_report( test_y, predictions)
print my_metrics
I get the following error:
Traceback (most recent call last):
File "multilabel.py", line 178, in <module>
clf.fit(train_x, train_y)
File "/sklearn/lib/python2.6/site-packages/sklearn/multiclass.py", line 277, in fit
Y = self.label_binarizer_.fit_transform(y)
File "/sklearn/lib/python2.6/site-packages/sklearn/base.py", line 455, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/sklearn/lib/python2.6/site-packages/sklearn/preprocessing/label.py", line 302, in fit
raise ValueError("Multioutput target data is not supported with "
ValueError: Multioutput target data is not supported with label binarization
Not using the MultiLabelBinarizer gives the same error, so I'm assuming that's not the problem. Does anyone know how to use this classifier for multilabel data?
Your train_test_split()
output is not correct. Change this line:
train_x, train_y, test_x, test_y = train_test_split(x, y_enc, test_size=0.33)
To this:
train_x, test_x, train_y, test_y = train_test_split(x, y_enc, test_size=0.33)
Also, to use probabilities instead of class predictions, you'll need to change SVC()
to SVC(probability = True)
and change clf.predict_proba
to clf.predict
.
Putting it all together:
from sklearn import metrics
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC
x = [[1,2,3],[3,3,2],[8,8,7],[3,7,1],[4,5,6]]
y = [['bar','foo'],['bar'],['foo'],['foo','jump'],['bar','fox','jump']]
mlb = MultiLabelBinarizer()
y_enc = mlb.fit_transform(y)
train_x, test_x, train_y, test_y = train_test_split(x, y_enc, test_size=0.33)
clf = OneVsRestClassifier(SVC(probability=True))
clf.fit(train_x, train_y)
predictions = clf.predict(test_x)
my_metrics = metrics.classification_report( test_y, predictions)
print my_metrics
This gives me no errors when I run it.
I also experienced "ValueError: Multioutput target data is not supported with label binarization" with OneVsRestClassifier. My issue was caused by the type of training data was "list", after casting with np.array(), it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With