LabelBinarizer yields different result in multiclass example

Question

When executing the multiclass example in the scikit-learn tutorial

http://scikit-learn.org/stable/tutorial/basic/tutorial.html#multiclass-vs-multilabel-fitting

I came across a slight oddity.

>>> import sklearn
>>> sklearn.__version__
0.19.1

>>> from sklearn.svm import SVC
>>> from sklearn.multiclass import OneVsRestClassifier
>>> from sklearn.preprocessing import LabelBinarizer

>>> X = [[1, 2], [2, 4], [4, 5], [3, 2], [3, 1]]
>>> y = [0, 0, 1, 1, 2] # Three classes

>>> clf = OneVsRestClassifier(estimator=SVC(random_state=0))
>>> clf.fit(X, y).predict(X)
array([0, 0, 1, 1, 2])

This is all fine. Now with one-hot encoding:

>>> y = LabelBinarizer().fit_transform(y)
>>> y
array([[1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 0, 1]])

I would expect the label binarizer to only encode the target, but not having an influence on the classifier. However it yields a different result:

>>> clf.fit(X, y).predict(X)
array([[1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 0],
       [0, 0, 0]])

Notebook on Google Colab (where the same code yields yet a different error, strangely):

https://drive.google.com/file/d/13dZ2aVbKTMgPOxj2SLsas2U2mOoKng2M/view?usp=sharing

Maxim · Accepted Answer

OneVsRestClassifier is applying LabelBinarizer itself under the hood (the source code in sklearn/multiclass.py):

def fit(self, X, y):
  ...
  self.label_binarizer_ = LabelBinarizer(sparse_output=True)
  Y = self.label_binarizer_.fit_transform(y)
  Y = Y.tocsc()
  self.classes_ = self.label_binarizer_.classes_

So extra manual conversion is unnecessary. In fact, it's interpreting your one-hot encoded y as multi-label input. From the documentation:

y : (sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]

Multi-class targets. An indicator matrix turns on multilabel classification.

LabelBinarizer yields different result in multiclass example

Tags:

python

machine-learning

svm

one-hot-encoding

scikit-learn

miku

1 Answers

Maxim

Recent Activity

Donate For Us

LabelBinarizer yields different result in multiclass example

Tags:

python

machine-learning

svm

one-hot-encoding

scikit-learn

miku

1 Answers

Maxim

Related questions

Recent Activity

Donate For Us