In a database there are time-series data with records:
device
- timestamp
- temperature
- min limit
- max limit
device
- timestamp
- temperature
- min limit
- max limit
device
- timestamp
- temperature
- min limit
- max limit
For every device
there are 4 hours of time series data (with an interval of 5 minutes) before an alarm was raised and 4 hours of time series data (again with an interval of 5 minutes) that didn't raise any alarm. This graph describes better the representation of the data, for every device
:
I need to use RNN class in python for alarm prediction. We define alarm when the temperature
goes below the min limit
or above the max limit
.
After reading the official documentation from tensorflow here, i'm having troubles understanding how to set the input to the model. Should i normalise the data beforehand or something and if yes how?
Also reading the answers here didn't help me as well to have a clear view on how to transform my data into an acceptable format for the RNN model.
Any help on how the X
and Y
in model.fit
should look like for my case?
If you see any other issue regarding this problem feel free to comment it.
PS. I have already setup python
in docker
with tensorflow
, keras
etc. in case this information helps.
Therefore, a RNN has two inputs: the present and the recent past. This is important because the sequence of data contains crucial information about what is coming next, which is why a RNN can do things other algorithms can't.
Input To RNN. Input data: RNN should have 3 dimensions. ( Batch Size, Sequence Length and Input Dimension) Batch Size is the number of samples we send to the model at a time. In this example, we have batch size = 2 but you can take it 4, 8,16, 32, 64 etc depends on the memory (basically in 2's power)
Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on. If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)
U, V and W are the weights of the hidden layer, the output layer and the hidden state, respectively. x t and o t are the input vector and output result at time t, respectively.
torch.nn.RNN has two inputs - input and h_0 ie. the input sequence and the hidden-layer at t=0. If we don't initialize the hidden layer, it will be auto-initiliased by PyTorch to be all zeros. input is the sequence which is fed into the network. It should be of size (seq_len, batch, input_size).
Weights: The RNN has input to hidden connections parameterized by a weight matrix U, hidden-to-hidden recurrent connections parameterized by a weight matrix W, and hidden-to-output connections parameterized by a weight matrix V and all these weights ( U, V, W) are shared across time. Output: o (t) illustrates the output of the network.
An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory. How Does Recurrent Neural Networks Work?
The RNN forward pass can thus be represented by below set of equations. This is an example of a recurrent network that maps an input sequence to an output sequence of the same length. The total loss for a given sequence of x values paired with a sequence of y values would then be just the sum of the losses over all the time steps.
You can begin with a snippet that you mention in the question.
Any help on how the X and Y in model.fit should look like for my case?
X
should be a numpy matrix of shape [num samples, sequence length, D]
, where D
is a number of values per timestamp. I suppose D=1
in your case, because you only pass temperature value.
y
should be a vector of target values (as in the snippet). Either binary (alarm/not_alarm), or continuous (e.g. max temperature deviation). In the latter case you'd need to change sigmoid activation for something else.
Should i normalise the data beforehand
Yes, it's essential to preprocess your raw data. I see 2 crucial things to do here:
Finally, I'd say that this task is more complex than it seems to be. You might want to either find a good starter tutorial on time-series classification, or a course on machine learning in general. I believe you can find a better method than RNN.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With