I am quite new to tensorflow, I would like to clearly know, what does the below command do?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
num_skipped = 0
for folder_name in ("Cat", "Dog"):
print("folder_name:",folder_name) #folder_name: Cat
folder_path = os.path.join("Dataset/PetImages", folder_name)
print("folder_path:",folder_path) #folder_path: Dataset/PetImages/Cat
for fname in os.listdir(folder_path):
print("fname:",fname) #fname: 5961.jpg
fpath = os.path.join(folder_path, fname)
print("fpath:", fpath) #fpath: Dataset/PetImages/Cat/10591.jpg
try:
fobj = open(fpath, "rb")
is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
finally:
fobj.close()
if not is_jfif:
num_skipped += 1
# Delete corrupted image
os.remove(fpath)
print("Deleted %d images" % num_skipped)
Keras Website comment on the above code :
When working with lots of real-world image data, corrupted images are a common occurence. Let's filter out badly-encoded images that do not feature the string "JFIF" in their header.
I want to specifically know what does the below command do, how does it do ?
is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
I checked the API but wasn't clearly able to understand it.
A better explanation will be of much help.
Thanks
Wikipedia explains that JPG files contain the string "JFIF" at the beginning of the file, encoded as bytes:
So:
tf.compat.as_bytes("JFIF")
converts the string "JFIF" to bytes. You could also just use b"JFIF"
, though maybe the TensorFlow implementation has some optimization I don't know about.fobj.peek(10)
theoretically returns the first 10 bytes of the file, but in practice it often returns the entire file.is_jfif
then just checks if the converted "JFIF" string is in the result of fobj.peek
.The command translates into bytes the given string (JFIF
) and checks if it is present in the file object, at byte 10
. It's a quick that verifies content of a header.
Not my first choice when it comes to dealing with "corrupted data", normally you'd leave this to a module that knows much more about image handling. It's a tutorial though, so the focus was on the brevity and highlighting a problem, rather then providing a comprehensive solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With