Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure input shape for bidirectional LSTM in Keras

I'm facing the following issue. I have a large number of documents that I want to encode using a bidirectional LSTM. Each document has a different number of words and word can be thought of as a timestep.

When configuring the bidirectional LSTM we are expected to provide the timeseries length. When I am training the model this value will be different for each batch. Should I choose a number for the timeseries_size which is the biggest document size I will allow? Any documents bigger than this will not be encoded?

Example config:

Bidirectional(LSTM(128, return_sequences=True), input_shape=(timeseries_size, encoding_size))
like image 759
Funzo Avatar asked Apr 14 '18 17:04

Funzo


People also ask

What's the output shape of a bidirectional LSTM layer with 64 units?

This is because you are using Bidirectional layer, it will be concatenated by a forward and backward pass and so you output will be (None, None, 64+64=128) .

What is the shape of the input in a LSTM?

The input of the LSTM is always is a 3D array. (batch_size, time_steps, seq_len) . The output of the LSTM could be a 2D array or 3D array depending upon the return_sequences argument.


1 Answers

This is a well-known problem and it concerns both ordinary and bidirectional RNNs. This discussion on GitHub might help you. In essence, here are the most common options:

  • A simple solution is to set the timeseries_size to be the max length over the training set and pad the shorter sequences with zeros. Example Keras code. An obvious downside is memory waste if the training set happens to have both very long and very short inputs.

  • Separate input samples into buckets of different lengths, e.g. a bucket for length <= 16, another bucket for length <= 32, etc. Basically this means training several separate LSTMs for different sets of sentences. This approach (known as bucketing) requires more effort, but currently considered most efficient and is actually used in the state-of-the-art translation engine Tensorflow Neural Machine Translation.

like image 189
Maxim Avatar answered Sep 20 '22 15:09

Maxim