I am quite new to tensorflow, I would like to clearly know, what does the below command do?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
num_skipped = 0
for folder_name in ("Cat", "Dog"):
print("folder_name:",folder_name) #folder_name: Cat
folder_path = os.path.join("Dataset/PetImages", folder_name)
print("folder_path:",folder_path) #folder_path: Dataset/PetImages/Cat
for fname in os.listdir(folder_path):
print("fname:",fname) #fname: 5961.jpg
fpath = os.path.join(folder_path, fname)
print("fpath:", fpath) #fpath: Dataset/PetImages/Cat/10591.jpg
try:
fobj = open(fpath, "rb")
is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
finally:
fobj.close()
if not is_jfif:
num_skipped += 1
# Delete corrupted image
os.remove(fpath)
print("Deleted %d images" % num_skipped)
Keras Website comment on the above code :
When working with lots of real-world image data, corrupted images are a common occurence. Let's filter out badly-encoded images that do not feature the string "JFIF" in their header.
I want to specifically know what does the below command do, how does it do ?
is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
I checked the API but wasn't clearly able to understand it.
A better explanation will be of much help.
Thanks
Wikipedia explains that JPG files contain the string "JFIF" at the beginning of the file, encoded as bytes:

So:
tf.compat.as_bytes("JFIF") converts the string "JFIF" to bytes. You could also just use b"JFIF", though maybe the TensorFlow implementation has some optimization I don't know about.fobj.peek(10) theoretically returns the first 10 bytes of the file, but in practice it often returns the entire file.is_jfif then just checks if the converted "JFIF" string is in the result of fobj.peek.The command translates into bytes the given string (JFIF) and checks if it is present in the file object, at byte 10. It's a quick that verifies content of a header.
Not my first choice when it comes to dealing with "corrupted data", normally you'd leave this to a module that knows much more about image handling. It's a tutorial though, so the focus was on the brevity and highlighting a problem, rather then providing a comprehensive solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With