Specifically, the LabelEncoder of creating an integer encoding of labels and the OneHotEncoder for creating a one hot encoding of integer encoded values.
To encode string array values, use the numpy. char. encode() method in Python Numpy. The arr is the input array to be encoded.
One dimensional array contains elements only in one dimension. In other words, the shape of the NumPy array should contain only one value in the tuple.
Your array a
defines the columns of the nonzero elements in the output array. You need to also define the rows and then use fancy indexing:
>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max()+1))
>>> b[np.arange(a.size),a] = 1
>>> b
array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.]])
>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.]])
In case you are using keras, there is a built in utility for that:
from keras.utils.np_utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=3)
And it does pretty much the same as @YXD's answer (see source-code).
Here is what I find useful:
def one_hot(a, num_classes):
return np.squeeze(np.eye(num_classes)[a.reshape(-1)])
Here num_classes
stands for number of classes you have. So if you have a
vector with shape of (10000,) this function transforms it to (10000,C). Note that a
is zero-indexed, i.e. one_hot(np.array([0, 1]), 2)
will give [[1, 0], [0, 1]]
.
Exactly what you wanted to have I believe.
PS: the source is Sequence models - deeplearning.ai
You can also use eye function of numpy:
numpy.eye(number of classes)[vector containing the labels]
You can use sklearn.preprocessing.LabelBinarizer
:
Example:
import sklearn.preprocessing
a = [1,0,3]
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print('{0}'.format(b))
output:
[[0 1 0 0]
[1 0 0 0]
[0 0 0 1]]
Amongst other things, you may initialize sklearn.preprocessing.LabelBinarizer()
so that the output of transform
is sparse.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With