Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make a ragged batch in Tensorflow 2.0?

I'm trying to create a data input pipeline from a Tensorflow Dataset that consists of 1d tensors of numerical data. I would like to create batches of ragged tensors; I do not want to pad the data.

For instance, if my data is of the form:

[
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4]
    ...
]

I would like my dataset to consist of batches of the form:

<tf.Tensor [
    <tf.RaggedTensor [
        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
        [0, 1, 2, 3, 4], 
        ...]>,
    <tf.RaggedTensor [
        [ ... ],
        ...]>
    ]>

I've tried creating a RaggedTensor using a map but I can't seem to do it on one dimensional data.

like image 434
ChosunOne Avatar asked Sep 14 '25 09:09

ChosunOne


1 Answers

I think this can be achieved with a little work before and after the batch.

# First, you can expand along the 0 axis for each data point
dataset = dataset.map(lambda x: tf.expand_dims(x, 0))
# Then create a RaggedTensor with a ragged rank of 1
dataset = dataset.map(lambda x: tf.RaggedTensor.from_tensor(x))
# Create batches
dataset = dataset.batch(BATCH_SIZE)
# Squeeze the extra dimension from the created batches
dataset = dataset.map(lambda x: tf.squeeze(x, axis=1))

Then the final output will be of the form:

<tf.RaggedTensor [
    <tf.Tensor [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]>,
    <tf.Tensor [0, 1, 2, 3]>,
    ...
]>

for each batch.

like image 66
ChosunOne Avatar answered Sep 15 '25 23:09

ChosunOne