Is it possible for obtain the total number of records from a .tfrecords
file ? Related to this, how does one generally keep track of the number of epochs that have elapsed while training models? While it is possible for us to specify the batch_size
and num_of_epochs
, I am not sure if it is straightforward to obtain values such as current epoch
, number of batches per epoch etc - just so that I could have more control of how the training is progressing. Currently, I'm just using a dirty hack to compute this as I know before hand how many records there are in my .tfrecords file and the size of my minibatches. Appreciate any help..
TFRecordReader() file = tf. train. string_input_producer("record. tfrecord") _, serialized_record = reader.
The rule of thumb is to have at least 10 times as many files as there will be hosts reading data. At the same time, each file should be large enough (at least 10 MB+ and ideally 100 MB+) so that you can benefit from I/O prefetching.
This dataset loads TFRecords from the files as bytes, exactly as they were written. TFRecordDataset does not do any parsing or decoding on its own. Parsing and decoding can be done by applying Dataset. map transformations after the TFRecordDataset .
To count the number of records, you should be able to use tf.python_io.tf_record_iterator
.
c = 0 for fn in tf_records_filenames: for record in tf.python_io.tf_record_iterator(fn): c += 1
To just keep track of the model training, tensorboard comes in handy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With