I want to store information of PIL images in a key-value store. For that, I hash the image and use the hash as a key.
I have been using the following code to compute the hash:
def hash(img):
return hashlib.md5(img.tobytes()).hexdigest()
But it seems like this is not stable. I have not figured out why, but for the same image on different machines, I get different hashes.
Is there a simple way of hashing images that only depends on the image itself (not on timestamps, system architecture, etc.)?
Note that I do not need similar images to get a similar/same hash, as in image hashing. In fact, I want different images to have a different hash, e.g. changing the brightness of the image should change its hash.
I'm guessing your goal is to perform image hashing in Python (which is much different than classic hashing, since byte representation of images is dependent on format, resolution and etc.)
One of the image hashing techniques would be average hashing. Make sure that this is not 100% accurate, but it works fine in most of the cases.
First we simplify the image by reducing its size and colors, reducing complexity of the image massively contributes to accuracy of comparison between other images:
Reducing size:
img = img.resize((10, 10), Image.ANTIALIAS)
Reducing colors:
img = img.convert("L")
Then, we find average pixel value of the image (which is obviously one of the main components of the average hashing):
pixel_data = list(img.getdata())
avg_pixel = sum(pixel_data)/len(pixel_data)
Finally hash is computed, we compare each pixel in the image to the average pixel value. If pixel is more than or equal to average pixel then we get 1, else it is 0. Then we convert these bits to base 16 representation:
bits = "".join(['1' if (px >= avg_pixel) else '0' for px in pixel_data])
hex_representation = str(hex(int(bits, 2)))[2:][::-1].upper()
If you want to compare this image to other images, you perform actions above, and find similarity between hexadecimal representation of average hashed images. You can use something as simple as hamming distance or more complex algorithms such as Levenshtein distance, Ratcliff/Obershelp pattern recognition (SequenceMatcher), Cosine Similarity etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With