Stateful LSTM and stream predictions

Tags:

I've trained an LSTM model (built with Keras and TF) on multiple batches of 7 samples with 3 features each, with a shape the like below sample (numbers below are just placeholders for the purpose of explanation), each batch is labeled 0 or 1:

Data:

Click to copy

[
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   ...
]

i.e: batches of m sequences, each of length 7, whose elements are 3-dimensional vectors (so batch has shape (m73))

Target:

Click to copy

[
   [1]
   [0]
   [1]
   ...
]

On my production environment data is a stream of samples with 3 features ([1,2,3],[1,2,3]...). I would like to stream each sample as it arrives to my model and get the intermediate probability without waiting for the entire batch (7) - see the animation below.

enter image description here

One of my thoughts was padding the batch with 0 for the missing samples, [[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]] but that seems to be inefficient.

Will appreciate any help that will point me in the right direction of both saving the LSTM intermediate state in a persistent way, while waiting for the next sample and predicting on a model trained on a specific batch size with partial data.

Update, including model code:

Click to copy

    opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

    first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), 
                      return_sequences=True, activation='tanh')
    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))
    model.add(LSTM(16, return_sequences=True, activation='tanh'))
    model.add(Dropout(0.2))
    model.add(LeakyReLU())
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), 
                           keras_metrics.recall(), f1])

Model Summary:

Click to copy

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           6272      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 1601      
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________

577

asked Nov 07 '18 13:11

Shlomi Schwartz

1 Answers

I think there might be an easier solution.

If your model does not have convolutional layers or any other layers that act upon the length/steps dimension, you can simply mark it as stateful=True

Warning: your model has layers that act on the length dimension !!

The Flatten layer transforms the length dimension into a feature dimension. This will completely prevent you from achieving your goal. If the Flatten layer is expecting 7 steps, you will always need 7 steps.

So, before applying my answer below, fix your model to not use the Flatten layer. Instead, it can just remove the return_sequences=True for the last LSTM layer.

The following code fixed that and also prepares a few things to be used with the answer below:

Click to copy

def createModel(forTraining):

    #model for training, stateful=False, any batch size   
    if forTraining == True:
        batchSize = None
        stateful = False

    #model for predicting, stateful=True, fixed batch size
    else:
        batchSize = 1
        stateful = True

    model = Sequential()

    first_lstm = LSTM(32, 
        batch_input_shape=(batchSize, num_samples, num_features), 
        return_sequences=True, activation='tanh', 
        stateful=stateful)   

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))

    #this is the last LSTM layer, use return_sequences=False
    model.add(LSTM(16, return_sequences=False, stateful=stateful,  activation='tanh'))

    model.add(Dropout(0.2))
    model.add(LeakyReLU())

    #don't add a Flatten!!!
    #model.add(Flatten())

    model.add(Dense(1, activation='sigmoid'))

    if forTraining == True:
        compileThisModel(model)

With this, you will be able to train with 7 steps and predict with one step. Otherwise it will not be possible.

The usage of a stateful model as a solution for your question

First, train this new model again, because it has no Flatten layer:

Click to copy

trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)

Now, with this trained model, you can simply create a new model exactly the same way you created the trained model, but marking stateful=True in all its LSTM layers. And we should copy the weights from the trained model.

Since these new layers will need a fixed batch size (Keras' rules), I assumed it would be 1 (one single stream is coming, not m streams) and added it to the model creation above.

Click to copy

predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())

And voilà. Just predict the outputs of the model with a single step:

Click to copy

pseudo for loop as samples arrive to your model:
    prob = predictingModel.predict_on_batch(sample)

    #where sample.shape == (1, 1, 3)

When you decide that you reached the end of what you consider a continuous sequence, call predictingModel.reset_states() so you can safely start a new sequence without the model thinking it should be mended at the end of the previous one.

Saving and loading states

Just get and set them, saving with h5py:

Click to copy

def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, 
            #consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s),
                                 data=K.eval(stat), 
                                 dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

Working test for saving/loading states

Click to copy

import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K




def createModel():
    inp = Input(batch_shape=(1,None,3))
    out = LSTM(5,return_sequences=True, stateful=True)(inp)
    out = LSTM(2, stateful=True)(out)
    out = Dense(1)(out)
    model = Model(inp,out)
    return model


def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

def printStates(model):

    for l in model.layers:
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(l,RNN):
            for s in l.states:
                print(K.eval(s))   

model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states

print('model1')
printStates(model1)
print('model2')
printStates(model2)

saveStates(model1,'testStates5')
loadStates(model2,'testStates5')

print('model1')
printStates(model1)
print('model2')
printStates(model2)

Considerations on the aspects of the data

In your first model (if it is stateful=False), it considers that each sequence in m is individual and not connected to the others. It also considers that each batch contains unique sequences.

If this is not the case, you might want to train the stateful model instead (considering that each sequence is actually connected to the previous sequence). And then you would need m batches of 1 sequence. -> m x (1, 7 or None, 3).

answered Oct 13 '22 13:10

Daniel Möller

Related questions
                            
                                How to minify nested properties
                            
                                Which browsers don't support empty tags?
                            
                                Azure DevOps: 1 Solution Multiple Projects CI/CD
                            
                                Xcode 10.2, Swift 5, Command compileSwift failed while build the program with Release Scheme
                            
                                How to apply default value to python dataclass field when None was passed?
                            
                                Export SASS/SCSS variables to Javascript without exporting them to CSS
                            
                                Include with FromSqlRaw and stored procedure in EF Core 3.1
                            
                                Python assignment operator differs from non assignment
                            
                                iOS simulator version doesn't appear - Xcode 12.2
                            
                                Why is this value downcast (static_cast to object type) allowed in C++20?
                            
                                Not supported platforms for java.awt.Desktop.getDesktop()
                            
                                TCP handshake with SOCK_RAW socket

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stateful LSTM and stream predictions

Tags:

Shlomi Schwartz

People also ask

1 Answers

Warning: your model has layers that act on the length dimension !!

The usage of a stateful model as a solution for your question

Saving and loading states

Working test for saving/loading states

Considerations on the aspects of the data

Daniel Möller

Recent Activity

Donate For Us