I'm cycling through an image folder and this keeps happening.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000' [[{{node DecodeJpeg}}]]
There are files in this folder that aren't images, but they should be filtered by my previous step. Anyone has an idea of what's going on?
test_files_ds = tf.data.Dataset.list_files(myFolder + '/*.jpg')
AUTOTUNE = tf.data.experimental.AUTOTUNE
def process_unlabeled_img(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(images=img, size=(224, 224))
return file_path, img
It's hard to know exactly what is going on without having the file at hand, but what is probably happening here is that you have files in your dataset that have either a .jpg
, .jpeg
, .png
or .gif
extension but that are not actually JPEG, PNG of GIF images. Thus, TensorFlow isn't able to properly load them.
One way to overcome this problem would be to check your files that are supposedly images and get rid of the ones that aren't actual JPEG, PNG or GIF images.
Checking if a file is a valid JPEG, PNG or GIF image is definitely more complicated than it seems, but checking for the file signature / magic number (that is, the first few bytes of your file) is a good start and should most of the time solve your problems.
So, practically, you could do so in many different ways, one of which being checking for each picture individually if it is valid or not, with some function of this sort:
def is_image(filename, verbose=False):
data = open(filename,'rb').read(10)
# check if file is JPG or JPEG
if data[:3] == b'\xff\xd8\xff':
if verbose == True:
print(filename+" is: JPG/JPEG.")
return True
# check if file is PNG
if data[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a':
if verbose == True:
print(filename+" is: PNG.")
return True
# check if file is GIF
if data[:6] in [b'\x47\x49\x46\x38\x37\x61', b'\x47\x49\x46\x38\x39\x61']:
if verbose == True:
print(filename+" is: GIF.")
return True
return False
You would then be able to get rid of your non valid images by doing something like this (this would delete your non valid images):
import os
# go through all files in desired folder
for filename in os.listdir(folder):
# check if file is actually an image file
if is_image(filename, verbose=False) == False:
# if the file is not valid, remove it
os.remove(os. path. join(folder, filename))
Now, as I said, this would probably solve your problem but please note that the function is_image
will not be able to tell for sure if a file can or cannot be read as a JPG, JPEG, PNG or GIF image. It is only a quick and dirty solution that will get the vast majority of errors alike away, but not all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With