I'm trying to create a data input pipeline from a Tensorflow Dataset that consists of 1d tensors of numerical data. I would like to create batches of ragged tensors; I do not want to pad the data.
For instance, if my data is of the form:
[
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4]
...
]
I would like my dataset to consist of batches of the form:
<tf.Tensor [
<tf.RaggedTensor [
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4],
...]>,
<tf.RaggedTensor [
[ ... ],
...]>
]>
I've tried creating a RaggedTensor
using a map but I can't seem to do it on one dimensional data.
I think this can be achieved with a little work before and after the batch.
# First, you can expand along the 0 axis for each data point
dataset = dataset.map(lambda x: tf.expand_dims(x, 0))
# Then create a RaggedTensor with a ragged rank of 1
dataset = dataset.map(lambda x: tf.RaggedTensor.from_tensor(x))
# Create batches
dataset = dataset.batch(BATCH_SIZE)
# Squeeze the extra dimension from the created batches
dataset = dataset.map(lambda x: tf.squeeze(x, axis=1))
Then the final output will be of the form:
<tf.RaggedTensor [
<tf.Tensor [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]>,
<tf.Tensor [0, 1, 2, 3]>,
...
]>
for each batch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With