Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix truncated tfrecords for tensorflow?

Tags:

tensorflow

I created a large .tfrecords file without seeing any error. However, during training, I saw the error "truncated record at XXXX" when the TfRecordReader reached somewhere near the end of .tfrecord file. How can I quickly check if the .tfrecord file is really corrupted? If so, how can I quickly fix the file (It's OK to discard the last few key-value pair)?

like image 987
read Read Avatar asked Oct 22 '16 05:10

read Read


2 Answers

The message means what it says --- the TFRecord file seems to end unexpectedly part way through a record.

If you want to understand what's going on under the hood, the file format is quite simple and is documented here: https://www.tensorflow.org/versions/r0.11/api_docs/python/python_io.html#tfrecords-format-details

One quick thing to check: is the file you are reading really a TFRecord file? It's always good to be sure.

It's hard to give a good answer on "how corrupted" a TFRecord file is --- all the reader code can do is tell you that something is inconsistent internally.

(Did your writing process terminate correctly and close the file when it was done?)

If you want to fix the file, probably your best bet is to regenerate it.

Alternatively you can read in the contents of the file using the reader functions documented at the link above, and write them out to a new TFRecord file. You'll lose the corrupted records, but you should be able to copy everything else over.

like image 143
Peter Hawkins Avatar answered Sep 18 '22 13:09

Peter Hawkins


I had some corrupted images, partially downloaded images to be precise, but i could not catch them using: imghdr.What(), Image.open().verify() or cv2.imread methods.

Only solution worked is this: Image.open(path/to/image).tobytes()

this code will throw error IOError if the image is corrupted.

hope it helps someone

like image 38
Norman D Avatar answered Sep 18 '22 13:09

Norman D