The following code: <pre class="prettyprint"><code>from sklearn.preprocessing import LabelBinarizer lb = LabelBinarizer() lb.fit_transform(['yes', 'no', 'no', 'yes']) </code></pre> returns: <pre class="prettyprint"><code>array([[1], [0], [0], [1]]) </code></pre> However, I would like for there to be one column per class: <pre class="prettyprint"><code>array([[1, 0], [0, 1], [0, 1], [1, 0]]) </code></pre> (I need the data in this format so I can give it to a neural network that uses the softmax function at the output layer) When there are more than 2 classes, LabelBinarizer behaves as desired: <pre class="prettyprint"><code>from sklearn.preprocessing import LabelBinarizer lb = LabelBinarizer() lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe']) </code></pre> returns <pre class="prettyprint"><code>array([[0, 0, 1], [0, 1, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]) </code></pre> Above, there is 1 column per class. Is there any simple way to achieve the same (1 column per class) when there are 2 classes? Edit: Based on yangjie's answer I wrote a class to wrap LabelBinarizer to produce the desired behavior described above: http://pastebin.com/UEL2dP62 <pre class="prettyprint"><code>import numpy as np from sklearn.preprocessing import LabelBinarizer class LabelBinarizer2: def __init__(self): self.lb = LabelBinarizer() def fit(self, X): # Convert X to array X = np.array(X) # Fit X using the LabelBinarizer object self.lb.fit(X) # Save the classes self.classes_ = self.lb.classes_ def fit_transform(self, X): # Convert X to array X = np.array(X) # Fit + transform X using the LabelBinarizer object Xlb = self.lb.fit_transform(X) # Save the classes self.classes_ = self.lb.classes_ if len(self.classes_) == 2: Xlb = np.hstack((Xlb, 1 - Xlb)) return Xlb def transform(self, X): # Convert X to array X = np.array(X) # Transform X using the LabelBinarizer object Xlb = self.lb.transform(X) if len(self.classes_) == 2: Xlb = np.hstack((Xlb, 1 - Xlb)) return Xlb def inverse_transform(self, Xlb): # Convert Xlb to array Xlb = np.array(Xlb) if len(self.classes_) == 2: X = self.lb.inverse_transform(Xlb[:, 0]) else: X = self.lb.inverse_transform(Xlb) return X </code></pre> Edit 2: It turns out yangjie has also written a new version of LabelBinarizer, awesome!

I think there is no direct way to do it especially if you want to have <code>inverse_transform</code>. But you can use numpy to construct the label easily <pre class="prettyprint"><code>In [18]: import numpy as np In [19]: from sklearn.preprocessing import LabelBinarizer In [20]: lb = LabelBinarizer() In [21]: label = lb.fit_transform(['yes', 'no', 'no', 'yes']) In [22]: label = np.hstack((label, 1 - label)) In [23]: label Out[23]: array([[1, 0], [0, 1], [0, 1], [1, 0]]) </code></pre> Then you can use <code>inverse_transform</code> by slicing the first column <pre class="prettyprint"><code>In [24]: lb.inverse_transform(label[:, 0]) Out[24]: array(['yes', 'no', 'no', 'yes'], dtype='<U3') </code></pre> <hr> Based on the above solution, you can write a class that inherits <code>LabelBinarizer</code>, which makes the operations and results consistent for both binary and multiclass case. <pre class="prettyprint"><code>from sklearn.preprocessing import LabelBinarizer import numpy as np class MyLabelBinarizer(LabelBinarizer): def transform(self, y): Y = super().transform(y) if self.y_type_ == 'binary': return np.hstack((Y, 1-Y)) else: return Y def inverse_transform(self, Y, threshold=None): if self.y_type_ == 'binary': return super().inverse_transform(Y[:, 0], threshold) else: return super().inverse_transform(Y, threshold) </code></pre> Then <pre class="prettyprint"><code>lb = MyLabelBinarizer() label1 = lb.fit_transform(['yes', 'no', 'no', 'yes']) print(label1) print(lb.inverse_transform(label1)) label2 = lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe']) print(label2) print(lb.inverse_transform(label2)) </code></pre> gives <pre class="prettyprint"><code>[[1 0] [0 1] [0 1] [1 0]] ['yes' 'no' 'no' 'yes'] [[0 0 1] [0 1 0] [0 1 0] [0 0 1] [1 0 0]] ['yes' 'no' 'no' 'yes' 'maybe'] </code></pre>

sklearn LabelBinarizer returns vector when there are 2 classes

Tags:

python

machine-learning

scikit-learn

The following code:

from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
lb.fit_transform(['yes', 'no', 'no', 'yes'])

returns:

array([[1],
       [0],
       [0],
       [1]])

However, I would like for there to be one column per class:

array([[1, 0],
       [0, 1],
       [0, 1],
       [1, 0]])

(I need the data in this format so I can give it to a neural network that uses the softmax function at the output layer)

When there are more than 2 classes, LabelBinarizer behaves as desired:

from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe'])

returns

array([[0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]])

Above, there is 1 column per class.

Is there any simple way to achieve the same (1 column per class) when there are 2 classes?

Edit: Based on yangjie's answer I wrote a class to wrap LabelBinarizer to produce the desired behavior described above: http://pastebin.com/UEL2dP62

import numpy as np
from sklearn.preprocessing import LabelBinarizer


class LabelBinarizer2:

    def __init__(self):
        self.lb = LabelBinarizer()

    def fit(self, X):
        # Convert X to array
        X = np.array(X)
        # Fit X using the LabelBinarizer object
        self.lb.fit(X)
        # Save the classes
        self.classes_ = self.lb.classes_

    def fit_transform(self, X):
        # Convert X to array
        X = np.array(X)
        # Fit + transform X using the LabelBinarizer object
        Xlb = self.lb.fit_transform(X)
        # Save the classes
        self.classes_ = self.lb.classes_
        if len(self.classes_) == 2:
            Xlb = np.hstack((Xlb, 1 - Xlb))
        return Xlb

    def transform(self, X):
        # Convert X to array
        X = np.array(X)
        # Transform X using the LabelBinarizer object
        Xlb = self.lb.transform(X)
        if len(self.classes_) == 2:
            Xlb = np.hstack((Xlb, 1 - Xlb))
        return Xlb

    def inverse_transform(self, Xlb):
        # Convert Xlb to array
        Xlb = np.array(Xlb)
        if len(self.classes_) == 2:
            X = self.lb.inverse_transform(Xlb[:, 0])
        else:
            X = self.lb.inverse_transform(Xlb)
        return X

Edit 2: It turns out yangjie has also written a new version of LabelBinarizer, awesome!

545

asked Aug 11 '15 16:08

applecider

1 Answers

I think there is no direct way to do it especially if you want to have inverse_transform.

But you can use numpy to construct the label easily

In [18]: import numpy as np

In [19]: from sklearn.preprocessing import LabelBinarizer

In [20]: lb = LabelBinarizer()

In [21]: label = lb.fit_transform(['yes', 'no', 'no', 'yes'])

In [22]: label = np.hstack((label, 1 - label))

In [23]: label
Out[23]:
array([[1, 0],
       [0, 1],
       [0, 1],
       [1, 0]])

Then you can use inverse_transform by slicing the first column

In [24]: lb.inverse_transform(label[:, 0])
Out[24]:
array(['yes', 'no', 'no', 'yes'],
      dtype='<U3')

Based on the above solution, you can write a class that inherits LabelBinarizer, which makes the operations and results consistent for both binary and multiclass case.

from sklearn.preprocessing import LabelBinarizer
import numpy as np

class MyLabelBinarizer(LabelBinarizer):
    def transform(self, y):
        Y = super().transform(y)
        if self.y_type_ == 'binary':
            return np.hstack((Y, 1-Y))
        else:
            return Y

    def inverse_transform(self, Y, threshold=None):
        if self.y_type_ == 'binary':
            return super().inverse_transform(Y[:, 0], threshold)
        else:
            return super().inverse_transform(Y, threshold)

Then

lb = MyLabelBinarizer()
label1 = lb.fit_transform(['yes', 'no', 'no', 'yes'])
print(label1)
print(lb.inverse_transform(label1))
label2 = lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe'])
print(label2)
print(lb.inverse_transform(label2))

gives

[[1 0]
 [0 1]
 [0 1]
 [1 0]]
['yes' 'no' 'no' 'yes']
[[0 0 1]
 [0 1 0]
 [0 1 0]
 [0 0 1]
 [1 0 0]]
['yes' 'no' 'no' 'yes' 'maybe']

166

answered Sep 19 '22 12:09

yangjie

Related questions
                            
                                Difference between int and numbers.Integral in Python
                            
                                How to get BPM and tempo audio features in Python [closed]
                            
                                parsing a complex logical expression in pyparsing in a binary tree fashion
                            
                                Python: iterate over a sublist
                            
                                get how much time python subprocess spends
                            
                                AES - Encryption with Crypto (node-js) / decryption with Pycrypto (python)
                            
                                How to use QThread correctly in pyqt with moveToThread()?
                            
                                Cannot find the file specified when using subprocess.call('dir', shell=True) in Python
                            
                                Should python-dev be required to install pip
                            
                                creating sets of tuples in python
                            
                                Why is factory_boy superior to using the ORM directly in tests?
                            
                                Why does Django South 1.0 use iteritems()?
                            
                                why is defining an object variable outside of __init__ frowned upon? [duplicate]
                            
                                Python Multiprocessing appending list
                            
                                populating matplotlib subplots through a loop and a function
                            
                                how to POST contents of JSON file to RESTFUL API with Python using requests module
                            
                                Read a binary file using Numpy fromfile and a given offset
                            
                                Selenium: How to disable image loading with firefox and python?
                            
                                Pandas Groupy take only the first N Groups [duplicate]
                            
                                Support doesn't work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With