The following code:
from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
lb.fit_transform(['yes', 'no', 'no', 'yes'])
returns:
array([[1],
[0],
[0],
[1]])
However, I would like for there to be one column per class:
array([[1, 0],
[0, 1],
[0, 1],
[1, 0]])
(I need the data in this format so I can give it to a neural network that uses the softmax function at the output layer)
When there are more than 2 classes, LabelBinarizer behaves as desired:
from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe'])
returns
array([[0, 0, 1],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0]])
Above, there is 1 column per class.
Is there any simple way to achieve the same (1 column per class) when there are 2 classes?
Edit: Based on yangjie's answer I wrote a class to wrap LabelBinarizer to produce the desired behavior described above: http://pastebin.com/UEL2dP62
import numpy as np
from sklearn.preprocessing import LabelBinarizer
class LabelBinarizer2:
def __init__(self):
self.lb = LabelBinarizer()
def fit(self, X):
# Convert X to array
X = np.array(X)
# Fit X using the LabelBinarizer object
self.lb.fit(X)
# Save the classes
self.classes_ = self.lb.classes_
def fit_transform(self, X):
# Convert X to array
X = np.array(X)
# Fit + transform X using the LabelBinarizer object
Xlb = self.lb.fit_transform(X)
# Save the classes
self.classes_ = self.lb.classes_
if len(self.classes_) == 2:
Xlb = np.hstack((Xlb, 1 - Xlb))
return Xlb
def transform(self, X):
# Convert X to array
X = np.array(X)
# Transform X using the LabelBinarizer object
Xlb = self.lb.transform(X)
if len(self.classes_) == 2:
Xlb = np.hstack((Xlb, 1 - Xlb))
return Xlb
def inverse_transform(self, Xlb):
# Convert Xlb to array
Xlb = np.array(Xlb)
if len(self.classes_) == 2:
X = self.lb.inverse_transform(Xlb[:, 0])
else:
X = self.lb.inverse_transform(Xlb)
return X
Edit 2: It turns out yangjie has also written a new version of LabelBinarizer, awesome!
LabelBinarizer makes this process easy with the transform method. At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method. Read more in the User Guide.
Multilabelbinarizer allows you to encode multiple labels per instance. To translate the resulting array, you could build a DataFrame with this array and the encoded classes (through its "classes_" attribute). binarizer = MultiLabelBinarizer() pd.DataFrame(binarizer.fit_transform(y), columns=binarizer.classes_)
Label Binarizer is an SciKit Learn class that accepts Categorical data as input and returns an Numpy array.
I think there is no direct way to do it especially if you want to have inverse_transform
.
But you can use numpy to construct the label easily
In [18]: import numpy as np
In [19]: from sklearn.preprocessing import LabelBinarizer
In [20]: lb = LabelBinarizer()
In [21]: label = lb.fit_transform(['yes', 'no', 'no', 'yes'])
In [22]: label = np.hstack((label, 1 - label))
In [23]: label
Out[23]:
array([[1, 0],
[0, 1],
[0, 1],
[1, 0]])
Then you can use inverse_transform
by slicing the first column
In [24]: lb.inverse_transform(label[:, 0])
Out[24]:
array(['yes', 'no', 'no', 'yes'],
dtype='<U3')
Based on the above solution, you can write a class that inherits LabelBinarizer
, which makes the operations and results consistent for both binary and multiclass case.
from sklearn.preprocessing import LabelBinarizer
import numpy as np
class MyLabelBinarizer(LabelBinarizer):
def transform(self, y):
Y = super().transform(y)
if self.y_type_ == 'binary':
return np.hstack((Y, 1-Y))
else:
return Y
def inverse_transform(self, Y, threshold=None):
if self.y_type_ == 'binary':
return super().inverse_transform(Y[:, 0], threshold)
else:
return super().inverse_transform(Y, threshold)
Then
lb = MyLabelBinarizer()
label1 = lb.fit_transform(['yes', 'no', 'no', 'yes'])
print(label1)
print(lb.inverse_transform(label1))
label2 = lb.fit_transform(['yes', 'no', 'no', 'yes', 'maybe'])
print(label2)
print(lb.inverse_transform(label2))
gives
[[1 0]
[0 1]
[0 1]
[1 0]]
['yes' 'no' 'no' 'yes']
[[0 0 1]
[0 1 0]
[0 1 0]
[0 0 1]
[1 0 0]]
['yes' 'no' 'no' 'yes' 'maybe']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With