Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stateful LSTM and stream predictions

Tags:

I've trained an LSTM model (built with Keras and TF) on multiple batches of 7 samples with 3 features each, with a shape the like below sample (numbers below are just placeholders for the purpose of explanation), each batch is labeled 0 or 1:

Data:

[
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   ...
]

i.e: batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m73))

Target:

[
   [1]
   [0]
   [1]
   ...
]

On my production environment data is a stream of samples with 3 features ([1,2,3],[1,2,3]...). I would like to stream each sample as it arrives to my model and get the intermediate probability without waiting for the entire batch (7) - see the animation below.

enter image description here

One of my thoughts was padding the batch with 0 for the missing samples, [[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]] but that seems to be inefficient.

Will appreciate any help that will point me in the right direction of both saving the LSTM intermediate state in a persistent way, while waiting for the next sample and predicting on a model trained on a specific batch size with partial data.


Update, including model code:

    opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

    first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), 
                      return_sequences=True, activation='tanh')
    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))
    model.add(LSTM(16, return_sequences=True, activation='tanh'))
    model.add(Dropout(0.2))
    model.add(LeakyReLU())
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), 
                           keras_metrics.recall(), f1])

Model Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           6272      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 1601      
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________
like image 577
Shlomi Schwartz Avatar asked Nov 07 '18 13:11

Shlomi Schwartz


People also ask

When should we use stateful in LSTM?

Stateful LSTM is used when the whole sequence plays a part in forming the output.

What is a stateful LSTM?

A powerful and popular recurrent neural network is the long short-term model network or LSTM. It is widely used because the architecture overcomes the vanishing and exploding gradient problem that plagues all recurrent neural networks, allowing very large and very deep networks to be created.

What is stateless and stateful LSTM?

Stateless works best when the the sequences you're learning aren't dependent on one another. Sentence-level prediction of a next word might be a good example of when to use stateless. The stateful configuration resets LSTM cell memory every epoch.

How do I choose a batch size in LSTM?

Optimal Batch Size? By experience, in most cases, an optimal batch-size is 64. Nevertheless, there might be some cases where you select the batch size as 32, 64, 128 which must be dividable by 8. Note that this batch size fine-tuning must be done based on the performance observation.


1 Answers

I think there might be an easier solution.

If your model does not have convolutional layers or any other layers that act upon the length/steps dimension, you can simply mark it as stateful=True

Warning: your model has layers that act on the length dimension !!

The Flatten layer transforms the length dimension into a feature dimension. This will completely prevent you from achieving your goal. If the Flatten layer is expecting 7 steps, you will always need 7 steps.

So, before applying my answer below, fix your model to not use the Flatten layer. Instead, it can just remove the return_sequences=True for the last LSTM layer.

The following code fixed that and also prepares a few things to be used with the answer below:

def createModel(forTraining):

    #model for training, stateful=False, any batch size   
    if forTraining == True:
        batchSize = None
        stateful = False

    #model for predicting, stateful=True, fixed batch size
    else:
        batchSize = 1
        stateful = True

    model = Sequential()

    first_lstm = LSTM(32, 
        batch_input_shape=(batchSize, num_samples, num_features), 
        return_sequences=True, activation='tanh', 
        stateful=stateful)   

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))

    #this is the last LSTM layer, use return_sequences=False
    model.add(LSTM(16, return_sequences=False, stateful=stateful,  activation='tanh'))

    model.add(Dropout(0.2))
    model.add(LeakyReLU())

    #don't add a Flatten!!!
    #model.add(Flatten())

    model.add(Dense(1, activation='sigmoid'))

    if forTraining == True:
        compileThisModel(model)

With this, you will be able to train with 7 steps and predict with one step. Otherwise it will not be possible.

The usage of a stateful model as a solution for your question

First, train this new model again, because it has no Flatten layer:

trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)

Now, with this trained model, you can simply create a new model exactly the same way you created the trained model, but marking stateful=True in all its LSTM layers. And we should copy the weights from the trained model.

Since these new layers will need a fixed batch size (Keras' rules), I assumed it would be 1 (one single stream is coming, not m streams) and added it to the model creation above.

predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())

And voilà. Just predict the outputs of the model with a single step:

pseudo for loop as samples arrive to your model:
    prob = predictingModel.predict_on_batch(sample)

    #where sample.shape == (1, 1, 3)

When you decide that you reached the end of what you consider a continuous sequence, call predictingModel.reset_states() so you can safely start a new sequence without the model thinking it should be mended at the end of the previous one.


Saving and loading states

Just get and set them, saving with h5py:

def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, 
            #consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s),
                                 data=K.eval(stat), 
                                 dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

Working test for saving/loading states

import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K




def createModel():
    inp = Input(batch_shape=(1,None,3))
    out = LSTM(5,return_sequences=True, stateful=True)(inp)
    out = LSTM(2, stateful=True)(out)
    out = Dense(1)(out)
    model = Model(inp,out)
    return model


def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

def printStates(model):

    for l in model.layers:
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(l,RNN):
            for s in l.states:
                print(K.eval(s))   

model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states

print('model1')
printStates(model1)
print('model2')
printStates(model2)

saveStates(model1,'testStates5')
loadStates(model2,'testStates5')

print('model1')
printStates(model1)
print('model2')
printStates(model2)

Considerations on the aspects of the data

In your first model (if it is stateful=False), it considers that each sequence in m is individual and not connected to the others. It also considers that each batch contains unique sequences.

If this is not the case, you might want to train the stateful model instead (considering that each sequence is actually connected to the previous sequence). And then you would need m batches of 1 sequence. -> m x (1, 7 or None, 3).

like image 51
Daniel Möller Avatar answered Oct 13 '22 13:10

Daniel Möller