Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to detect duplicate image files?

Tags:

python

image

I have over 10K files for products, the problem is is that many of the images are duplicates.

If there is no image, there is a standard image that says 'no image'.

How can I detect if the image is this standard 'no image' image file?

Update The image is a different name, but it is exactly the same image otherwise.

People are saying Hash, so would I do this?

im = cStringIO.StringIO(file.read())
img = im.open(im)
md5.md5(img)
like image 584
Blankman Avatar asked Aug 01 '10 21:08

Blankman


1 Answers

As a sidenote, for images, I find raster data hashes to be far more effective than file hashes.

ImageMagick provides reliable way to compute such hashes, and there are different bindings for python available. It helps to detect same images with different lossless compressions and different metadata.

Usage example:

>>> import PythonMagick
>>> img = PythonMagick.Image("image.png")
>>> img.signature()
'e11cfe58244d7cf98a79bfdc012857a9391249dca3aedfc0fde4528eed7f7ba7'
like image 156
Daniel Kluev Avatar answered Sep 22 '22 12:09

Daniel Kluev