Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split .tfrecords file into many .tfrecords files

Is there any way to split .tfrecords file into many .tfrecords files directly, without writing back each Dataset example ?

like image 585
christk Avatar asked Feb 04 '19 15:02

christk


People also ask

What is a Tfrecord file?

The TFRecord format is a simple format for storing a sequence of binary records. Protocol buffers are a cross-platform, cross-language library for efficient serialization of structured data. Protocol messages are defined by . proto files, these are often the easiest way to understand a message type.

What is TF example?

proto files, these are often the easiest way to understand a message type. The tf. Example message (or protobuf) is a flexible message type that represents a {"string": value} mapping. It is designed for use with TensorFlow and is used throughout the higher-level APIs such as TFX.


1 Answers

In tensorflow 2.0.0, this will work:

import tensorflow as tf

raw_dataset = tf.data.TFRecordDataset("input_file.tfrecord")

shards = 10

for i in range(shards):
    writer = tf.data.experimental.TFRecordWriter(f"output_file-part-{i}.tfrecord")
    writer.write(raw_dataset.shard(shards, i))
like image 186
Eric Avatar answered Oct 02 '22 00:10

Eric