I am taking screenshots of an application, and trying to detect if the exact image has been seen before. I am looking to detect trivial changes as different - e.g. if there is text in the image, and the spelling changes, that counts as a mismatch.
I've been successfully using an MD5 hash of the contents of an screen-shot image to lookup in a database of known images, and detect if it has been seen before.
Now, I have ported it to another machine, and despite my attempts to exactly match configurations, I am getting ever-so-slightly different images to the older machine. When I say different, the changes are minute - if I blow up the old and new images and flick between then, I can't see a single difference! Nonetheless, ImageMagick's compare
command can see a smattering of pixels that are different.
So my MD5 hashes are no longer matching. Rather than a simple MD5 hash, I need an image hash.
Doing my research, I find that most of the image hashes try to be fairly generous - they accept resized, transformed and watermarked images, with a corresponding false positive matches. I want an image hash that is far more strict - the only changes permitted are minute changes in colour.
Can anyone recommend an image hash library or algorithm? (Not an application, like dupdetector).
Remember: My requirements are different from the many similar questions in that I don't want a liberal algorithm like shrinking or pHash, and I don't want a comparison tool like structural similarity or ImageMagick's compare.
I want a hash that makes very similar images give the same hash value. Is that even possible?
A image hash function maps an image to a short string called image hash, and can be used for image authentication or as a digital fingerprint. Nevertheless, it can occur that two visually different images get the same image hash, which is called a collision.
A perceptual hash, is a generated string (hash) that is produced by a special algorithm. This perceptual hash is a fingerprint based on some input picture, that can be used to compare images by calculating the Hamming distance (which basically counts the number of different individual bits).
Hashing is a powerful tool used by hotlines, Law Enforcement, Industry and other child protection organisations in the removal of Child Sexual Abuse Material (CSAM). This is because it enables known items of CSAM to be detected and removed without requiring them to be assessed again by an analyst.
pHashes allows the comparison of two images by looking at the number of different bits between the input and the image it is being compared against. This difference is known as the Hamming distance. A very simple way of using this algorithm would be to create a list of all known images and their perceptual hash.
You can have a look at the following paper called "Spectral hashing". It is an algorithm that is designed to produce hash codes from images in order to group together similar images (see the retrieval examples at the end of the paper). It is a good starting point.
The link: http://www.cs.huji.ac.il/~yweiss/SpectralHashing/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With