Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting correct input for RNN

In a database there are time-series data with records:

  • device - timestamp - temperature - min limit - max limit
  • device - timestamp - temperature - min limit - max limit
  • device - timestamp - temperature - min limit - max limit
  • ...

For every device there are 4 hours of time series data (with an interval of 5 minutes) before an alarm was raised and 4 hours of time series data (again with an interval of 5 minutes) that didn't raise any alarm. This graph describes better the representation of the data, for every device:

enter image description here

I need to use RNN class in python for alarm prediction. We define alarm when the temperature goes below the min limit or above the max limit.

After reading the official documentation from tensorflow here, i'm having troubles understanding how to set the input to the model. Should i normalise the data beforehand or something and if yes how?

Also reading the answers here didn't help me as well to have a clear view on how to transform my data into an acceptable format for the RNN model.

Any help on how the X and Y in model.fit should look like for my case?

If you see any other issue regarding this problem feel free to comment it.

PS. I have already setup python in docker with tensorflow, keras etc. in case this information helps.

like image 363
GeorgeGeorgitsis Avatar asked Aug 03 '20 10:08

GeorgeGeorgitsis


People also ask

What is the input for RNN?

Therefore, a RNN has two inputs: the present and the recent past. This is important because the sequence of data contains crucial information about what is coming next, which is why a RNN can do things other algorithms can't.

What is RNN input size?

Input To RNN. Input data: RNN should have 3 dimensions. ( Batch Size, Sequence Length and Input Dimension) Batch Size is the number of samples we send to the model at a time. In this example, we have batch size = 2 but you can take it 4, 8,16, 32, 64 etc depends on the memory (basically in 2's power)

How can I make my RNN better?

Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on. If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)

What is U and V in RNN?

U, V and W are the weights of the hidden layer, the output layer and the hidden state, respectively. x t and o t are the input vector and output result at time t, respectively.

What is input_size of RNN in PyTorch?

torch.nn.RNN has two inputs - input and h_0 ie. the input sequence and the hidden-layer at t=0. If we don't initialize the hidden layer, it will be auto-initiliased by PyTorch to be all zeros. input is the sequence which is fed into the network. It should be of size (seq_len, batch, input_size).

What is the input and output of RNN?

Weights: The RNN has input to hidden connections parameterized by a weight matrix U, hidden-to-hidden recurrent connections parameterized by a weight matrix W, and hidden-to-output connections parameterized by a weight matrix V and all these weights ( U, V, W) are shared across time. Output: o (t) ​ illustrates the output of the network.

Can an RNN handle sequential data?

An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory. How Does Recurrent Neural Networks Work?

How to represent the RNN forward pass?

The RNN forward pass can thus be represented by below set of equations. This is an example of a recurrent network that maps an input sequence to an output sequence of the same length. The total loss for a given sequence of x values paired with a sequence of y values would then be just the sum of the losses over all the time steps.


1 Answers

You can begin with a snippet that you mention in the question.

Any help on how the X and Y in model.fit should look like for my case?

X should be a numpy matrix of shape [num samples, sequence length, D], where D is a number of values per timestamp. I suppose D=1 in your case, because you only pass temperature value.

y should be a vector of target values (as in the snippet). Either binary (alarm/not_alarm), or continuous (e.g. max temperature deviation). In the latter case you'd need to change sigmoid activation for something else.

Should i normalise the data beforehand

Yes, it's essential to preprocess your raw data. I see 2 crucial things to do here:

  1. Normalise temperature values with min-max or standardization (wiki, sklearn preprocessing). Plus, I'd add a bit of smoothing.
  2. Drop some fraction of last timestamps from all of the time-series to avoid information leak.

Finally, I'd say that this task is more complex than it seems to be. You might want to either find a good starter tutorial on time-series classification, or a course on machine learning in general. I believe you can find a better method than RNN.

like image 78
roman Avatar answered Nov 10 '22 22:11

roman