I have over 10K files for products, the problem is is that many of the images are duplicates.
If there is no image, there is a standard image that says 'no image'.
How can I detect if the image is this standard 'no image' image file?
Update The image is a different name, but it is exactly the same image otherwise.
People are saying Hash, so would I do this?
im = cStringIO.StringIO(file.read())
img = im.open(im)
md5.md5(img)
As a sidenote, for images, I find raster data hashes to be far more effective than file hashes.
ImageMagick provides reliable way to compute such hashes, and there are different bindings for python available. It helps to detect same images with different lossless compressions and different metadata.
Usage example:
>>> import PythonMagick
>>> img = PythonMagick.Image("image.png")
>>> img.signature()
'e11cfe58244d7cf98a79bfdc012857a9391249dca3aedfc0fde4528eed7f7ba7'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With