Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Elmo a word embedding or a sentence embedding?

Supposedly, Elmo is a word embedding. So if the input is a sentence or a sequence of words, the output should be a sequence of vectors. Apparently, this is not the case.

The code below uses keras and tensorflow_hub.

a = ['aaa bbbb cccc uuuu vvvv wrwr', 'ddd ee fffff ppppp']
a = np.array(a, dtype=object)[:, np.newaxis]
#a.shape==(2,1)

input_text = layers.Input(shape=(1,), dtype="string")
embedding = ElmoEmbeddingLayer()(input_text)
model = Model(inputs=[input_text], outputs=embedding)

model.summary()

The class ElmoEmbedding is from https://github.com/strongio/keras-elmo/blob/master/Elmo%20Keras.ipynb.

b = model.predict(a)
#b.shape == (2, 1024)

Apparently, the embedding assigns a 1024-dimensional vector to each sentence. This is confusing.

Thank you.

like image 617
Myath Avatar asked Sep 13 '25 03:09

Myath


1 Answers

I think I've found the answer. It's in https://tfhub.dev/google/elmo/2.

The output dictionary contains:

  1. word_emb: the character-based word representations with shape [batch_size, max_length, 512].

  2. lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024].

  3. lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024].

  4. elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024]

  5. default: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].

The 4th layer is the actual word embedding. The 5th one reduces sequence output by the 4th layer to a single vector, effectively turning the whole thing into a sentence embedding.

like image 186
Myath Avatar answered Sep 15 '25 18:09

Myath