Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: Count number of examples in a TFRecord file -- without using deprecated `tf.python_io.tf_record_iterator`

Please read post before marking Duplicate:

I was looking for an efficient way to count the number of examples in a TFRecord file of images. Since a TFRecord file does not save any metadata about the file itself, the user has to loop through the file in order to calculate this information.

There are a few different questions on StackOverflow that answer this question. The problem is that all of them seem to use the DEPRECATED tf.python_io.tf_record_iterator command, so this is not a stable solution. Here is the sample of existing posts:

Obtaining total number of records from .tfrecords file in Tensorflow

Number of examples in each tfrecord

Number of examples in each tfrecord

So I was wondering if there was a way to count the number of records using the new Dataset API.

like image 831
krishnab Avatar asked Apr 09 '19 17:04

krishnab


1 Answers

There is a reduce method listed under the Dataset class. They give an example of counting records using the method:

# generate the dataset (batch size and repeat must be 1, maybe avoid dataset manipulation like map and shard)
ds = tf.data.Dataset.range(5) 
# count the examples by reduce
cnt = ds.reduce(np.int64(0), lambda x, _: x + 1)

## produces 5

Don't know whether this method is faster than the @krishnab's for loop.

like image 125
Maosi Chen Avatar answered Oct 04 '22 03:10

Maosi Chen