Write tf.dataset back to TFRecord

Question

After creating a tf.data.Dataset, I would like to write it to TFRecords.

One way to do that is to iterate through the complete dataset and write after serializeToString into TFRecords. But it is not the most efficient way to do it.

Are there easier ways to do this? Are there any APIs available in TF2.0?

nessuno · Accepted Answer

You could use TensorFlow Datasets (tfds): this library is not only a collection of ready to use tf.data.Dataset objects, but it is also a toolchain for the transformation of raw data to TFRecords.

Following the official guide is straightforward adding a new dataset. In short, you only have to implement the methods _info and _generate_examples.

In particular, the _generate_examples is the method that is used by tfds to create rows inside the TFRecords. Every element that _generate_examples yields is a dictionary; every dictionary is a row in a TFRecord file.

For example (kept from the official documentation) the generate_examples below is used by tfds to save TFRecords, each one with the records "image_description", "image", "label".

def _generate_examples(self, images_dir_path, labels):
  # Read the input data out of the source files
  for image_file in tf.io.gfile.listdir(images_dir_path):
    ...
  with tf.io.gfile.GFile(labels) as f:
    ...

  # And yield examples as feature dictionaries
  for image_id, description, label in data:
    yield image_id, {
        "image_description": description,
        "image": "%s/%s.jpeg" % (images_dir_path, image_id),
        "label": label,
    }

In your case, you can just use the tf.data.Dataset object you already have, and loop through it (in the generate_examples method), and yielding the rows of the TFRecord.

In this way, tfds will take care for you of the serialization and you'll find in the ~/tensorflow_datasets folder the TFRecord created for your dataset.

Write tf.dataset back to TFRecord

Tags:

tensorflow

tensorflow2.0

tensorflow-datasets

yuva-rajulu

1 Answers

nessuno

Recent Activity

Donate For Us

Write tf.dataset back to TFRecord

Tags:

tensorflow

tensorflow2.0

tensorflow-datasets

yuva-rajulu

1 Answers

nessuno

Related questions

Recent Activity

Donate For Us