Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras SimpleRNN confusion

Tags:

python

keras

rnn

...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...

For example:

x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)

...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.

However, the following code:

x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)

raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.

It works only if I add another dimension:

x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)

...but now, of course, my input would not be (None, 2) anymore.

model.summary():

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 2, 1)              0         
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 10)                120       
=================================================================

How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?

Furthermore, how would I chain RNN cells?

x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)

...raises the same exception with incompatible dim sizes.

This sample here works:

x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)

...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.

Why is this needed at all?

Moreover: where are the states? Do they just default to 1 recurrent state?

like image 526
daniel451 Avatar asked May 31 '18 01:05

daniel451


1 Answers

The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:

  1. Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
  2. The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
  3. If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).

Now returning to your questions:

  1. You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
  2. To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
  3. States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.

There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.

like image 107
nuric Avatar answered Nov 15 '22 01:11

nuric