Is there an efficient way to get a fingerprint of an image for duplicate detection?
That is, given an image file, say a jpg or png, I'd like to be able to quickly calculate a value that identifies the image content and is fairly resilient to other aspects of the image (eg. the image metadata) changing. If it deals with resizing that's even better.
[Update] Regarding the meta-data in jpg files, does anyone know if it's stored in a specific part of the file? I'm looking for an easy way to ignore it - eg. can I skip the first x bytes of the file or take x bytes from the end of the file to ensure I'm not getting meta-data?
Abstract: Image fingerprinting is a technique that summarizes the perceptual characteristics of a digital image into an invariant digest, and it is one of the most effective solutions for digital rights management.
Fingerprint recognition systems work by examining a finger pressed against a smooth surface. The finger's ridges and valleys are scanned, and a series of distinct points, where ridges and valleys end or meet, are called minutiae. These minutiae are the points the fingerprint recognition system uses for comparison.
Stab in the dark, if you are looking to circumvent meta-data and size related things:
And numerous others.
Basically:
Advantages are:
Disadvantages:
Checkout image analysis books such as:
And others
If you are scaling the image, then things are simpler. If not, then you have to contend with the fact that scaling is lossy in more ways than sample reduction.
Using the byte size of the image for comparison would be suitable for many applications. Another way would be to:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With