Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Image hashing

Tags:

python

hash

I'm currently trying to get a hash from an image in python, i have successfully done this and it works somewhat.

However, I have this issue: Image1 and image2 end up having the same hash, even though they are different. I need a form of hashing which is more accurate and precise.

Image1 = Image1

Image2 = Image2

The hash for the images is: faf0761493939381

I am currently using from PIL import Image import imagehash

And imagehash.average_hash

Code here

import os
from PIL import Image
import imagehash


def checkImage():
    for filename in os.listdir('images//'):
        hashedImage = imagehash.average_hash(Image.open('images//' + filename))
    print(filename, hashedImage)

    for filename in os.listdir('checkimage//'):
        check_image = imagehash.average_hash(Image.open('checkimage//' + filename))
    print(filename, check_image)

    if check_image == hashedImage:
        print("Same image")
    else:
        print("Not the same image")

    print(hashedImage, check_image)


checkImage()
like image 202
Logan Avatar asked Jun 17 '26 01:06

Logan


2 Answers

Try using hashlib. Just open the file and perform a hash.

import hashlib
# Simple solution
with open("image.extension", "rb") as f:
    hash = hashlib.sha256(f.read()).hexdigest()
# General-purpose solution that can process large files
def file_hash(file_path):
    # https://stackoverflow.com/questions/22058048/hashing-a-file-in-python

    sha256 = hashlib.sha256()

    with open(file_path, "rb") as f:
        while True:
            data = f.read(65536) # arbitrary number to reduce RAM usage
            if not data:
                break
            sha256.update(data)

    return sha256.hexdigest()

Thanks to Antonín Hoskovec for pointing out that it should be read binary (rb), not simple read (r)!

like image 51
ericl16384 Avatar answered Jun 18 '26 14:06

ericl16384


By default, imagehash checks if image files are nearly identical. The files you are comparing are more similar than they are not. If you want a more or less unique way of fingerprinting files you can use a different approach, such as employing a cryptographic hashing algorithm:

import hashlib

def get_hash(img_path):
    # This function will return the `md5` checksum for any input image.
    with open(img_path, "rb") as f:
        img_hash = hashlib.md5()
        while chunk := f.read(8192):
           img_hash.update(chunk)
    return img_hash.hexdigest()
like image 36
Deneb Avatar answered Jun 18 '26 16:06

Deneb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!