Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

converting a Tensorflow Dataset of time series elements to a Dataset of windowed sequences

Tags:

tensorflow

I have a tf.data.Dataset(r1.4) whose elements represent a time series. For example (line breaks separate elements):
1 2 3 4 5 6 7 8 9

Now I want to run a window operation on it so that I get a Dataset of sub sequences of length WINDOW_SIZE for training an RNN. For example, for WINDOW_SIZE=4:

1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9

The closest Dataset op I could find is tf.contrib.data.group_by_window, but not sure how to apply it for this use case.
Another way is to use tf.contrib.data.batch_and_drop_remainder, but it will divide the elements into buckets and won't have all the sub sequences.
A third option I thought of was to create WINDOW_SIZE iterators, and run them individually so that they point to consecutive elements, and then start using them in a sequence. However, this looks quite counter intuitive.

like image 729
devin Avatar asked Dec 01 '17 07:12

devin


1 Answers

In TensorFlow 2.0, the Dataset class now has a window() method. You can use it like this:

import tensorflow as tf

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.window(5, shift=1, drop_remainder=True)
for window in dataset:
    print([elem.numpy() for elem in window])

It will output:

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]
like image 180
MiniQuark Avatar answered Oct 10 '22 02:10

MiniQuark