I am working on a image classification Kaggle competition and download some training images from Kaggle.com. Then I am using transfer learning with ResNet50 to work on these images, within Keras 2.0 and Tensorflow as background (and Python 3).
However, 258 out the total 1281 train images are having 'Possibly corrupt EXIF data' and been ignored when loaded to the ResNet model, very likely due to a Pillow issue.
The output messages are like:
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 524288 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 393216 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 33554432 bytes but only got 0. Skipping tag 4
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 25165824 bytes but only got 0. Skipping tag 4
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 131072 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
(more to come ...)
Based on the output messages, I only know they are there, but don't know which ones they are...
My question is: how can I identify these 258 images so that I can manually remove them out of the data set?
Edit: To raise Warnings as errors which you can catch, take a look at Justas comment below.
Even if this question is over a year old, i want to show my solution cause i was running into the same problem.
I was editing the error messages. The output shows where to find the file on your system and also the line number. For example i changed following:
if len(data) != size:
warnings.warn("Possibly corrupt EXIF data. "
"Expecting to read %d bytes but only got %d."
" Skipping tag %s" % (size, len(data), tag))
continue
to
if len(data) != size:
raise ValueError('Corrupt Exif data')
warnings.warn("Possibly corrupt EXIF data. "
"Expecting to read %d bytes but only got %d."
" Skipping tag %s" % (size, len(data), tag))
continue
My code to catch the ValueError is shown below. The code gives you the advantage that PIL is interrupted and is not showing an useless message. Also you can catch this one and use it, e.g. to delete the corresponding file via the 'except' part.
import os
from PIL import Image
imageFolder = /Path/To/Image/Folder
listImages = os.listdir(imageFolder)
for img in listImages:
imgPath = os.path.join(imageFolder,img)
try:
img = Image.open(imgPath)
exif_data = img._getexif()
except ValueError as err:
print(err)
print("Error on image: ", img)
I know adding the ValueError part is quick and dirty, but it's better than get confronted with all the useless warning messages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With