Don't Understand how to Implement Embeddings for Categorical Features

Question

From various examples I've found online I still don't quite understand how to create embedding layers from my categorical data for neural network models, especially when I have a mix of numerical and categorical data. For example, taking the data set as below.:

numerical_df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=['num_1','num_2','num_3'])

cat_df = pd.DataFrame(np.random.randint(0,5,size=(100, 3)), columns=['cat_1','cat_2','cat_3'])

df = numerical_df.join(cat_df)

I want to create embedding layers for my categorical data and use that in conjunction with my numerical data but from all the examples I've seen its almost like the model just filters the entire dataset through the embedding layer, which is confusing.

As an example of my confusion, below is an example from Keras' documentation on sequential models. It's as though they just add the embedding step as the first layer and fit it to the entirety of x_train.

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM

max_features = 1024

model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)

So ultimately when it comes to creating embedding matrices, is there one per categorical variable...one for all categorical variables? And how do I reconcile this with my other data that doesn't need an embedding matrix?

KevinH · Accepted Answer

To combine the categorical data with numerical data, your model should use multiple inputs using the functional API. One for each categorical variable and one for the numerical inputs. Its up to you how you want to then combine all that data together, but I assume it makes sense to just concatenate everything together and then continue with the rest of your model.

numerical_in = Input(shape=(3,))
cat_in       = Input(shape=(3,))
embed_layer  = Embedding(input_dim=5, output_dim=3, input_length=3)(cat_in)
embed_layer  = Flatten(embed_layer)
merged_layer = concatenate([numerical_in, embed_layer])
output       = rest_of_your_model(merged_layer)
model        = Model(inputs=[numerical_in, cat_in], outputs=[output])

...

model.fit(x=[numerical_df, cat_df], y=[your_expected_out])

Don't Understand how to Implement Embeddings for Categorical Features

Tags:

python

neural-network

keras

data-science

trystuff

1 Answers

KevinH

Recent Activity

Donate For Us

Don't Understand how to Implement Embeddings for Categorical Features

Tags:

python

neural-network

keras

data-science

trystuff

1 Answers

KevinH

Related questions

Recent Activity

Donate For Us