Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Train and predict on variable length sequences

Sensors (of the same type) scattered on my site are manually reporting on irregular intervals to my backend. Between reports the sensors aggregate events and report them as a batch.

The following dataset is a collection of sequence events data, batch collected. For example sensor 1 reported 2 times. On the first batch 2 events and on the second batch 3 events, while sensor 2 reported 1 time with 3 events.

I would like to use this data as my train data X

sensor_id batch_id timestamp feature_1 feature_n
1 1 2020-12-21T00:00:00+00:00 0.54 0.33
1 1 2020-12-21T01:00:00+00:00 0.23 0.14
1 2 2020-12-21T03:00:00+00:00 0.51 0.13
1 2 2020-12-21T04:00:00+00:00 0.23 0.24
1 2 2020-12-21T05:00:00+00:00 0.33 0.44
2 1 2020-12-21T00:00:00+00:00 0.54 0.33
2 1 2020-12-21T01:00:00+00:00 0.23 0.14
2 1 2020-12-21T03:00:00+00:00 0.51 0.13

My target y, is a score calculated from all the events collected by a sensor:
I.E socre_sensor_1 = f([[batch1...],[batch2...]])

sensor_id final_score
1 0.8
2 0.6

I would like to predict y each time a batch is collected, I.E 2 predictions for a sensor with 2 reports.


LSTM model:
I've started with an LSTM model, since I'm trying to predict on a time-series of events. My first thought was to select a fixed size input and to zero pad the input when the number of events collected is smaller than the input size.Then mask the padded value:

model.add(Masking(mask_value=0., input_shape=(num_samples, num_features)))

For example:

sensor_id batch_id timestamp feature_1 feature_n
1 1 2020-12-21T00:00:00+00:00 0.54 0.33
1 1 2020-12-21T01:00:00+00:00 0.23 0.14

Would produce the following input if selected length is 5:

[
 [0.54, 0.33],
 [0.23, 0.14],
 [0,0],
 [0,0],
 [0,0]
]

However, the variance of number of events per sensor report in my train data is large, one report could collect 1000 events while the other one can collect 10. So if I'm selecting the average size (let's say 200), some inputs would be with a lot of padding, while other would be truncated and data will be lost.

I've heard about ragged tensors, but I'm not sure it fit my use case. How would one approach such a problem?

like image 858
Shlomi Schwartz Avatar asked Dec 21 '20 15:12

Shlomi Schwartz


Video Answer


2 Answers

I don't have the specific of your model, but TF implementation of LSTM usually expect (batch, seq, features) as input.

Now lest assume this is one of your batch_id:

data = np.zeros((15,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

You could reshape it with (1, 15, 5) and feed it to the model, but anytime your batch_id length vary your sequence length will vary too and your model expect a fix sequence.

Instead you could reshape your data before training so that the batch_id length is passed as the batch size:

data = data[:,np.newaxis,:] 

array([[[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]]])

Same data, with shape (15, 1, 5) but your model would now be looking at a fix length of 1 and the number of sample would vary.

Make sure to reshape your label as well.

To my knowledge, RNN and LSTM being applied for each time steps and state being reset between bacthes only this should not impact the model behavior.

like image 128
Yoan B. M.Sc Avatar answered Oct 13 '22 18:10

Yoan B. M.Sc


Working with variable-sized input sequences is quite simple. While there is a restriction of having the same sized sequence within each batch, there is NO RESTRICTION of having variable-sized sequences between the batches. Using this to your advantage, you can simply set the input sequence for the LSTM to (None, features) and use batch_size as 1.

Let's create a generator that generates variable-length sequences of 2 features and a random float score that you seek as a function of these sequences, similar to your input data for the sensors.

#Infinitely creates batches of dummy data
def generator():
    while True:
        length = np.random.randint(2, 10) #Variable length sequences
        x_train = np.random.random((1, length, 2)) #batch, seq, features
        y_train = np.random.random((1,1)) #batch, score
        yield x_train, y_train

next(generator())
#x.shape = (1,4,2), y.shape = (1,1)
(array([[[0.63841991, 0.91141833],
         [0.73131801, 0.92771373],
         [0.61298585, 0.6455549 ],
         [0.25893925, 0.40202978]]]),
 array([[0.05934613]]))

Above is an example of a 4 length sequence created by the generator while the next is a 9 length one.

next(generator())
#x.shape = (1,9,2), y.shape = (1,1)
(array([[[0.76006158, 0.27457503],
         [0.57739596, 0.75416962],
         [0.03029365, 0.29339812],
         [0.93866829, 0.79137367],
         [0.52739961, 0.11475738],
         [0.85832651, 0.19247399],
         [0.37098216, 0.48703114],
         [0.95846681, 0.15507787],
         [0.86945015, 0.70949593]]]),
 array([[0.02560889]]))

Now, let's create an LSTM based neural net that can work with these variable-sized sequences for each batch.

from tensorflow.keras import layers, Model, utils

inp = layers.Input((None, 2))
x = layers.LSTM(10, return_sequences=True)(inp)
x = layers.LSTM(10)(x)
out = layers.Dense(1)(x)

model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)

enter image description here

Training these with a batch size of 1 -

model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(generator(), steps_per_epoch=100, epochs=10, batch_size=1)
#Steps_per_epoch is to stop the generator from generating infinite batches of data per epoch.
Epoch 1/10
100/100 [==============================] - 1s 5ms/step - loss: 1.5145
Epoch 2/10
100/100 [==============================] - 0s 5ms/step - loss: 0.7435
Epoch 3/10
100/100 [==============================] - 0s 4ms/step - loss: 0.7885
Epoch 4/10
100/100 [==============================] - 0s 4ms/step - loss: 0.7384
Epoch 5/10
100/100 [==============================] - 0s 4ms/step - loss: 0.7139
Epoch 6/10
100/100 [==============================] - 0s 5ms/step - loss: 0.7462
Epoch 7/10
100/100 [==============================] - 0s 4ms/step - loss: 0.7173
Epoch 8/10
100/100 [==============================] - 0s 4ms/step - loss: 0.7116
Epoch 9/10
100/100 [==============================] - 0s 4ms/step - loss: 0.6875
Epoch 10/10
100/100 [==============================] - 0s 4ms/step - loss: 0.7153

This is how you can work with variable-sized sequences as inputs. Padding/masking is only necessary for sequences that are part of the same batch.

Now, you could create a generator for your input data that generates one sequence of events as input to the model at one time, in which case you do not need to specify the batch_size explicitly since you are generating one sequence at a time already.

Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches).

Or you could use the ragged tensors you were mentioning and provide a batch_size of 1 for each sequence. Personally, I prefer working with generators for training data as it gives you a lot more flexibility in pre-processing as well.

Interestingly, you could optimize this code further, but bundling batches of same length sequences together in a batch(es) and then passing a variable batch size. This would help if you have tons of data and can't afford to run a batch_size of 1 for each gradient update!

Another word of caution! If your sequences are extremely long, then I would recommend using Truncated Backpropagation through time (TBPTT) (find details here).

Hope this solves what you are looking for.

like image 28
Akshay Sehgal Avatar answered Oct 13 '22 18:10

Akshay Sehgal