Iยดm trying to update a Keras LSTM to avoid the concept of drift. For that Iยดm following the approach proposed in this paper [1] on which they compute an anomaly score and they apply it to update the network weights. In the paper they use the L2 norm to compute the anomaly score and then they update the model weights. As it is stated in the paper:
RNN Update: The anomaly score ๐๐ก is then used to update the network W๐กโ1 to obtain W๐ก using backpropagation through time (BPTT):
W๐ก = W๐กโ1 โ ๐โ๐๐ก(W๐กโ1) where ๐ is the learning rate
Iโm trying to update the LSTM network weights, but although I have seen some improvements in the model performance for forecasting multi-step ahead multi-sensor data Iโm not sure if the improvement is because the updates deal with the drift concept or just because the model is refitted with the newest data.
Here is an example model:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(n_neurons, input_shape=(n_seq, n_features)))
model.add(layers.Dense(n_pred_seq * n_features))
model.add(layers.Reshape((n_pred_seq, n_features)))
model.compile(optimizer='adam', loss='mse')
And here is the way on which Iโm updating the model:
y_pred = model.predict_on_batch(x_batch)
up_y = data_y[i,]
a_score = sqrt(mean_squared_error(data_y[i,].flatten(), y_pred[0, :]))
w = model.layers[0].get_weights() #Only get weights for LSTM layer
for l in range(len(w)):
w[l] = w[l] - (w[l]*0.001*a_score) #0.001=learning rate
model.layers[0].set_weights(w)
model.fit(x_batch, up_y, epochs=1, verbose=1)
model.reset_states()
Iโm wondering if this is the correct way to update the LSTM neural network and how the BPTT is applied after updating the weights.
P.D.: I have also seen other methods to detect concept drift such as the ADWIN method from the skmultiflow package but I found this one especially interesting because it also deals with anomalies, updating the model slightly when new data with concept drift comes and almost ignoring the updates when anomalous data comes.
[1] Online Anomaly Detection with Concept Drift Adaptation using Recurrent Neural Networks Saurav, S., Malhotra, P., TV, V., Gugulothu, N., Vig, L., Agarwal, P., & Shroff, G. (2018, January). In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 78-87). ACM.
I personally thinks that it's a valid method. The fact that you're updating the ntework weights depends on what you're doing, so if you do it like you do it's fine.
Maybe another way to do it is to implement your own loss function and embed the anti-drift parameter into it, but it might be a little complicated.
Regarding the BPTT i think it's applied as normal, but you have different "starting points", the weights you've just updated.
Looking at the second block of your code, I believe you are not calculating the gradient properly. Specifically, the gradient update w[l] = w[l] - (w[l]*0.001*a_score) seems to be wrong to me.
Here you are multiplying the weights and the anomaly score. However, the original gradient update equation 
means to calculate the gradient of W_{t-1} using the loss \alpha_t, it does not mean to multiply \alpha_t with W_{t-1}.
To apply the online update correctly, you just need to sample your stream sequentially and apply the model.fit() as usual.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With