Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: Embedding in LSTM

In a keras example on LSTM for modeling IMDB sequence data (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py), there is an embedding layer before input into a LSTM layer:

model.add(Embedding(max_features,128)) #max_features=20000
model.add(LSTM(128))

What does the embedding layer really do? In this case, does that mean the length of the input sequence into the LSTM layer is 128? If so, can I write the LSTM layer as:

model.add(LSTM(128,input_shape=(128,1))

But it is also noted the input X_train has subjected to pad_sequences processing:

print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen) #maxlen=80
X_test = sequence.pad_sequences(X_test, maxlen=maxlen) #maxlen=80

It seems the input sequence length is 80?

like image 642
jingweimo Avatar asked Jun 26 '17 11:06

jingweimo


1 Answers

To quote the documentation:

Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

Basically this transforms indexes (that represent which words your IMDB review contained) to a vector with the given size (in your case 128).

If you don't know what embeddings are in general, here is the wikipedia definition:

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers in a low-dimensional space relative to the vocabulary size ("continuous space").

Coming back to the other question you've asked:

In this case, does that means the length of the input sequence into the LSTM layer is 128?

not quite. For recurrent nets you'll have a time dimension and a feature dimension. 128 is your feature dimension, as in how many dimensions each embedding vector should have. The time dimension in your example is what is stored in maxlen, which is used to generate the training sequences.

Whatever you supply as 128 to the LSTM layer is the actual number of output units of the LSTM.

like image 183
Thomas Jungblut Avatar answered Sep 24 '22 06:09

Thomas Jungblut