tf.data.Dataset.padded_batch pad differently each feature

Tags:

I have a tf.data.Dataset instance which holds 3 different features

label which is a scalar
sequence_feature which is a sequence of scalars
seq_of_seqs_feature which is a sequence of sequences feature

I am trying to use tf.data.Dataset.padded_batch() to genereate padded data as input to my model - and I want to pad every feature differently.

Example batch:

[{'label': 24,
  'sequence_feature': [1, 2],
  'seq_of_seqs_feature': [[11.1, 22.2],
                          [33.3, 44.4]]},
 {'label': 32,
  'sequence_feature': [3, 4, 5],
  'seq_of_seqs_feature': [[55.55, 66.66]]}]

Expected output:

[{'label': 24,
  'sequence_feature': [1, 2, 0],
  'seq_of_seqs_feature': [[11.1, 22.2],
                          [33.3, 44.4]]},
 {'label': 32,
  'sequence_feature': [3, 4, 5],
  'seq_of_seqs_feature': [[55.55, 66.66],
                           0.0, 0.0    ]}]

As you can see the label feature should not be padded, and the sequence_feature and seq_of_seqs_feature should be padded by the corresponding longest entry in the given batch.

524

asked Apr 15 '18 08:04

bluesummers

1 Answers

The tf.data.Dataset.padded_batch() method allows you to specify padded_shapes for each component (feature) of the resulting batch. For example, if your input dataset is called ds:

padded_ds = ds.padded_batch(
    BATCH_SIZE,
    padded_shapes={
        'label': [],                          # Scalar elements, no padding.
        'sequence_feature': [None],           # Vector elements, padded to longest.
        'seq_of_seqs_feature': [None, None],  # Matrix elements, padded to longest
    })                                        # in each dimension.

Notice that the padded_shapes argument has the same structure as your input dataset's elements, so in this case it takes a dictionary with keys that match your feature names.

185

answered Sep 22 '22 15:09

mrry

Related questions
                            
                                What does `S` signify in sympy
                            
                                Remove annotation while keeping plot matplotlib
                            
                                Changing colors for decision tree plot created using export graphviz
                            
                                Django autoreload: add watched file
                            
                                django rest framework: Get url path variable in a view
                            
                                add a different random number to every cell in a pandas dataframe
                            
                                Compare two list and output missing and extra element (Python)
                            
                                Scrapy - Understanding CrawlSpider and LinkExtractor
                            
                                Best practice to implement copy method
                            
                                Python: AttributeError: Can't pickle local object 'writeBuf.<locals>.write'
                            
                                Summing columns to form a new dataframe
                            
                                S3FS python, credential inline
                            
                                Functions as objects in Python: what exactly is stored in memory?
                            
                                Best way to get a single key from a dictionary?
                            
                                Apply custom function over multiple columns in pandas
                            
                                Bootstrap does not be set to Django app
                            
                                How to read gif from url using opencv (python)
                            
                                Will changes in DataFrame.values always modify the values in the data frame?
                            
                                Filter pandas dataframe from tuples
                            
                                How to freeze the top row and the first column using XlsxWriter?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tf.data.Dataset.padded_batch pad differently each feature

Tags:

python

tensorflow

tensorflow-datasets

bluesummers

People also ask

1 Answers

mrry

Recent Activity

Donate For Us