Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras LSTM TimeDistributed, stateful

Is there any detailed explanations how do TimeDistributed, stateful and return_sequences work? Do I have to set shuffle=False in both cases? Does it work for windows (1-11, 2-12, 3-13 etc.) or should it be used in batches (1-11, 12-22, 13-33 etc.)

I'm particularly interested in LSTM layers.

like image 676
Alex Ozerov Avatar asked Oct 23 '17 19:10

Alex Ozerov


1 Answers

TimeDistributed:

This does not affect how layers work. The purpose of this is to have an additional "time" (it may not be time too) dimension. The wrapped layer will be applied to each slice of the input tensor considering this time dimension.

For instance, if a layer is expecting an input shape with 3 dimensions, say (batch, length, features), using the TimeDistributed wrapper will make it expect 4 dimensions: (batch, timeDimension, length, features)

The layer will then be "copied" and applied equally to each element in the time dimension.

With an LSTM layer, it works the same. Although an LSTM layer already expects a time dimension in its input shape: (batch, timeSteps, features), you can use the TimeDistributed to add yet another "time" dimension (which may mean anything, not exactly time) and make this LSTM layer to be reused for each element in this new time dimension.

  • LSTM - expects inputs (batch, timeSteps, features)
  • TimeDistributed(LSTM()) - expects inputs (batch, superSteps, timeSteps, features)

In any case, the LSTM will only actually perform its recurrent calculations in the timeSteps dimension. The other time dimension is just replicating this layer many times.

TimeDistributed + Dense:

The Dense layer (and maybe a few others), already supports 3D inputs, although the standard is 2D: (batch, inputFeatures).

Using the TimeDistributed or not with Dense layers is optional and the result is the same: if your data is 3D, the Dense layer will be repeated for the second dimension.

Return sequences:

This is well explained in the documentation.

With recurrent layers, keras will use the timeSteps dimension to perform its recurrent steps. For each step, it will naturally have an output.

You can choose to get the outputs for all steps (return_sequences=True) or to get just the last output (return_sequences=False)

Consider an input shape like (batch, timeSteps, inputFeatures) and a layer with outputFeatures units:

  • When return_sequences=True, the output shape is (batch, timeSteps, outputFeatures)
  • When return_sequences=False, the output shape is (batch, outputFeatures)

In any case, if you use a TimeDistributed wrapper, the superSteps dimension will be in the input and the output, unchanged.

Stateful = True

Usually, if you can put all your sequences with all their steps in an input array, everything is fine and you don't need stateful=True layers.

Keras creates a "state" for each sequence in the batch. The batch dimension is equal to the number of sequences. When keras finishes processing a batch, it automatically resets the states, meaning: we reached the end (last time step) of the sequences, bring new sequences from the first step.

When using stateful=True, these states will not be reset. This means that sending another batch to the model will not be interpreted as a new set of sequences, but additional steps for the sequences that were processed before. You must then model.reset_states() manually to tell the model that you've reached the last step of the sequences, or that you will start new sequences.

The only case that needs shuffle=False is this stateful=True case. Because for each batch, many sequences are input. In every batch these sequences must be kept in the same order, so that the states for each sequence don't get mixed.

Stateful layers are good for:

  • Too big data. It doesn't fit your memory if you use all time steps at once
  • You want to continuosly generate time steps and include these new steps as inputs to the next ones, without having fixed sizes. (You create these loops yourself in the code)
  • (any comments from other users?? )

Working with windows

So far, the only way I could work with windows was replicating data.

The input array should be organized in windows. One sequence per window step. You could optionally take advantage of the TimeDistributed wrapper if you want to keep all window steps as a single batch entry. But you can make all steps be individual sequences as well.

The stateful=True layer won't work with windows because of the states. If you input in a batch the steps from 1 to 12, the next batch will be expecting the step 13 as first step to keep the connection.

like image 177
Daniel Möller Avatar answered Sep 19 '22 06:09

Daniel Möller