Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow DecodeJPEG: Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\000\000\00'

I'm cycling through an image folder and this keeps happening.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000' [[{{node DecodeJpeg}}]]

There are files in this folder that aren't images, but they should be filtered by my previous step. Anyone has an idea of what's going on?

test_files_ds = tf.data.Dataset.list_files(myFolder + '/*.jpg') 

AUTOTUNE = tf.data.experimental.AUTOTUNE


def process_unlabeled_img(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(images=img, size=(224, 224))
    return file_path, img
like image 565
Nicolas Gervais Avatar asked Jun 23 '20 22:06

Nicolas Gervais


1 Answers

It's hard to know exactly what is going on without having the file at hand, but what is probably happening here is that you have files in your dataset that have either a .jpg, .jpeg, .png or .gif extension but that are not actually JPEG, PNG of GIF images. Thus, TensorFlow isn't able to properly load them.

One way to overcome this problem would be to check your files that are supposedly images and get rid of the ones that aren't actual JPEG, PNG or GIF images.

Checking if a file is a valid JPEG, PNG or GIF image is definitely more complicated than it seems, but checking for the file signature / magic number (that is, the first few bytes of your file) is a good start and should most of the time solve your problems.

So, practically, you could do so in many different ways, one of which being checking for each picture individually if it is valid or not, with some function of this sort:

def is_image(filename, verbose=False):

    data = open(filename,'rb').read(10)

    # check if file is JPG or JPEG
    if data[:3] == b'\xff\xd8\xff':
        if verbose == True:
             print(filename+" is: JPG/JPEG.")
        return True

    # check if file is PNG
    if data[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a':
        if verbose == True:
             print(filename+" is: PNG.")
        return True

    # check if file is GIF
    if data[:6] in [b'\x47\x49\x46\x38\x37\x61', b'\x47\x49\x46\x38\x39\x61']:
        if verbose == True:
             print(filename+" is: GIF.")
        return True

    return False

You would then be able to get rid of your non valid images by doing something like this (this would delete your non valid images):

import os

# go through all files in desired folder
for filename in os.listdir(folder):
     # check if file is actually an image file
     if is_image(filename, verbose=False) == False:
          # if the file is not valid, remove it
          os.remove(os. path. join(folder, filename))

Now, as I said, this would probably solve your problem but please note that the function is_image will not be able to tell for sure if a file can or cannot be read as a JPG, JPEG, PNG or GIF image. It is only a quick and dirty solution that will get the vast majority of errors alike away, but not all.

like image 195
bglbrt Avatar answered Oct 30 '22 08:10

bglbrt