I have a tf.data.Dataset(r1.4) whose elements represent a time series. For example (line breaks separate elements):1
2
3
4
5
6
7
8
9
Now I want to run a window operation on it so that I get a Dataset of sub sequences of length WINDOW_SIZE for training an RNN. For example, for WINDOW_SIZE=4:
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
The closest Dataset op I could find is tf.contrib.data.group_by_window, but not sure how to apply it for this use case.
Another way is to use tf.contrib.data.batch_and_drop_remainder, but it will divide the elements into buckets and won't have all the sub sequences.
A third option I thought of was to create WINDOW_SIZE iterators, and run them individually so that they point to consecutive elements, and then start using them in a sequence. However, this looks quite counter intuitive.
In TensorFlow 2.0, the Dataset
class now has a window()
method. You can use it like this:
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.window(5, shift=1, drop_remainder=True)
for window in dataset:
print([elem.numpy() for elem in window])
It will output:
[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With