Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I one hot encode a list of strings with Keras?

I have a list:

code = ['<s>', 'are', 'defined', 'in', 'the', '"editable', 'parameters"', '\n', 'section.', '\n', 'A', 'larger', '`tsteps`', 'value', 'means', 'that', 'the', 'LSTM', 'will', 'need', 'more', 'memory', '\n', 'to', 'figure', 'out']

And I want to convert to one hot encoding. I tried:

to_categorical(code)

And I get an error: ValueError: invalid literal for int() with base 10: '<s>'

What am I doing wrong?

like image 339
Shamoon Avatar asked May 20 '19 20:05

Shamoon


People also ask

What function do we use to create one-hot encoded arrays of the labels?

The one_hot function provides a simple interface to convert class label integers into a so-called one-hot array, where each unique label is represented as a column in the new array.

What does to_categorical do in keras?

to_categorical functionConverts a class vector (integers) to binary class matrix.

Does keras require hot encoding?

Because we do want to show you how one-hot encoding works with TensorFlow and Keras, we do use categorical crossentropy loss instead, so we must apply one-hot encoding to the samples.


1 Answers

keras only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so:

Manual encoding

# this integer encoding is purely based on position, you can do this in other ways
integer_mapping = {x: i for i,x in enumerate(code)}

vec = [integer_mapping[word] for word in code]
# vec is
# [0, 1, 2, 3, 16, 5, 6, 22, 8, 22, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

Using scikit-learn

from sklearn.preprocessing import LabelEncoder
import numpy as np

code = np.array(code)

label_encoder = LabelEncoder()
vec = label_encoder.fit_transform(code)

# array([ 2,  6,  7,  9, 19,  1, 16,  0, 17,  0,  3, 10,  5, 21, 11, 18, 19,
#         4, 22, 14, 13, 12,  0, 20,  8, 15])

You can now feed this into keras.utils.to_categorical:

from keras.utils import to_categorical

to_categorical(vec)
like image 125
C.Nivs Avatar answered Oct 09 '22 14:10

C.Nivs