Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LSTM Autoencoder no progress when script is running on larger dataset

The shape of p_input in this LSTM Autoencoder for "test.py" is (128,8,1); meaning 128 sets of 8 digits. I am trying to adapt this model to time-series-based data with 4 sets of 25,000 time steps (basically 0 seconds to 25,000 seconds). I tried to input this dataset into p_input with the shape (4,25000,1) and no errors occurred. However when I run the script, instead of getting iter 1: 0.01727, iter 2: 0.00983, ... I do not get any printed feedback from the script so I assume something is holding the script up. I have also tried to just change the batch_num to 4 and step_num to 25,000 directly onto the unedited "test.py" file and the same result of no printed feedback occurred.

My thoughts are that in "test.py", p_inputs is taking too long to compute the tf.split and tf.squeeze operations. Another thought is that I might need to increase the number of hidden LSTM units in hidden_num and/or increase the number of epochs (iteration). In addition, it could be that the batch_num has to be greater than the step_num. I tried this with "test.py" with step_num = 4 and batch_num = 25000 and the script ran normally with printed feedback.

Let me know your thoughts on what the problem might be in holding up the script from running.

like image 913
Julian Rachman Avatar asked Aug 22 '17 02:08

Julian Rachman


1 Answers

The second dimension of your input is the number of times the network gets unrolled for computing gradients by the BPTT algorithm.

The idea is that a recurrent network (like the LSTM) is transformed into a feedforward network by "unrolling" each time step as new layer of the network.

When you provide the entire time series together (i.e. 25000 time steps) you are unrolling your network 25000 times, that is you will obtain an unrolled feedforward network with 25000 layers!!

So, even though I don't know why you don't get any error, the problem is probably related to an OUT OF MEMORY issue. You are not able to fit 25000 layers' variables into the memory.

When you have to deal with long time series you need to split your data into chunks (lets say of 20 time steps). You provide a single chunk per run. Then, at each following run, you need to restore the initial state of the network with the last state of the previous run.

I can provide you an example. What you have now (I neglect the third dimension for practical reasons) is a vector 4x25000 that is shaped something like this:

--------------------- 25000----------------------
|
|
4
|
|
--------------------------------------------------

You now have to split it into chunks like these:

----20-----  ----20-----  ----20-----
|         |  |         |  |         |
|         |  |         |  |         |
4         |  4         |  4         |  [...]
|         |  |         |  |         |
|         |  |         |  |         |
-----------  -----------  ----------- 

You provide a single chunck of 4x20 each time. Then, the final state of your LSTM after each chuck, must be provided as input with the next chuck.

So your feed_dict must be something like this:

feed_dict ={x: input_4_20}, 
            state.c = previous_state.c, 
            state.h=previous_state.h}

See the LM tutorial of Tensorflow for an example on how to provide the state of an LSTM to the next run.

Tensorflow provides some function to do this automatically. Check the Tensorflow DevSummit Tutorial on RNN API for more. I linked the exact second where the desired functions is explained. The function is the tf.contrib.training.batch_sequences_with_states(...)

As a last advice, I would suggest you to rethink at your task. As a matter of fact, a time series of 25000 is a really LONG sequence and I'm worried about the fact that an even a LSTM can't manage such long past dependencies. What I mean is that when you are processing the 24000th element of the series the LSTM state has probably forgot everything about the 1st element. In these cases, try to look at your data to see which is the scale of your phenomena. If you don't need a granularity of a single second (i.e. your series is highly redundant because features do not change very rapidly in time), downscale your series to have a shorter sequence to manage.

like image 139
Giuseppe Marra Avatar answered Nov 15 '22 20:11

Giuseppe Marra