Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow Removing JFIF

I am quite new to tensorflow, I would like to clearly know, what does the below command do?

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os

num_skipped = 0
for folder_name in ("Cat", "Dog"):
    print("folder_name:",folder_name) #folder_name: Cat
    folder_path = os.path.join("Dataset/PetImages", folder_name)
    print("folder_path:",folder_path) #folder_path: Dataset/PetImages/Cat
    for fname in os.listdir(folder_path):
        print("fname:",fname) #fname: 5961.jpg
        fpath = os.path.join(folder_path, fname)
        print("fpath:", fpath) #fpath: Dataset/PetImages/Cat/10591.jpg
        try:
            fobj = open(fpath, "rb")
            is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
        finally:
            fobj.close()

        if not is_jfif:
            num_skipped += 1
            # Delete corrupted image
            os.remove(fpath)

print("Deleted %d images" % num_skipped)

Keras Website comment on the above code :

When working with lots of real-world image data, corrupted images are a common occurence. Let's filter out badly-encoded images that do not feature the string "JFIF" in their header.

I want to specifically know what does the below command do, how does it do ?

 is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)

I checked the API but wasn't clearly able to understand it.

A better explanation will be of much help.

Thanks

like image 245
Joker Avatar asked Jan 01 '23 01:01

Joker


2 Answers

Wikipedia explains that JPG files contain the string "JFIF" at the beginning of the file, encoded as bytes:

JFIF header

So:

  • tf.compat.as_bytes("JFIF") converts the string "JFIF" to bytes. You could also just use b"JFIF", though maybe the TensorFlow implementation has some optimization I don't know about.
  • fobj.peek(10) theoretically returns the first 10 bytes of the file, but in practice it often returns the entire file.
  • is_jfif then just checks if the converted "JFIF" string is in the result of fobj.peek.
like image 177
jdaz Avatar answered Jan 02 '23 14:01

jdaz


The command translates into bytes the given string (JFIF) and checks if it is present in the file object, at byte 10. It's a quick that verifies content of a header.

Not my first choice when it comes to dealing with "corrupted data", normally you'd leave this to a module that knows much more about image handling. It's a tutorial though, so the focus was on the brevity and highlighting a problem, rather then providing a comprehensive solution.

like image 27
Lukasz Tracewski Avatar answered Jan 02 '23 13:01

Lukasz Tracewski