Efficient way to fingerprint an image (jpg, png, etc)?

Tags:

Is there an efficient way to get a fingerprint of an image for duplicate detection?

That is, given an image file, say a jpg or png, I'd like to be able to quickly calculate a value that identifies the image content and is fairly resilient to other aspects of the image (eg. the image metadata) changing. If it deals with resizing that's even better.

[Update] Regarding the meta-data in jpg files, does anyone know if it's stored in a specific part of the file? I'm looking for an easy way to ignore it - eg. can I skip the first x bytes of the file or take x bytes from the end of the file to ensure I'm not getting meta-data?

621

asked Aug 11 '09 17:08

Parand

2 Answers

Stab in the dark, if you are looking to circumvent meta-data and size related things:

Edge Detection and scale-independent comparison
Sampling and statistical analysis of grayscale/RGB values (average lum, averaged color map)
FFT and other transforms (Good article Classification of Fingerprints using FFT)

And numerous others.

Basically:

Convert JPG/PNG/GIF whatever into an RGB byte array which is independent of encoding
Use a fuzzy pattern classification method to generate a 'hash of the pattern' in the image ... not a hash of the RGB array as some suggest
Then you want a distributed method of fast hash comparison based on matching threshold on the encapsulated hash or encoding of the pattern. Erlang would be good for this :)

Advantages are:

Will, if you use any AI/Training, spot duplicates regardless of encoding, size, aspect, hue and lum modification, dynamic range/subsampling differences and in some cases perspective

Disadvantages:

Can be hard to code .. something like OpenCV might help
Probabilistic ... false positives are likely but can be reduced with neural networks and other AI
Slow unless you can encapsulate pattern qualities and distribute the search (MapReduce style)

Checkout image analysis books such as:

Pattern Classification 2ed
Image Processing Fundamentals
Image Processing - Principles and Applications

And others

If you are scaling the image, then things are simpler. If not, then you have to contend with the fact that scaling is lossy in more ways than sample reduction.

answered Oct 22 '22 19:10

Aiden Bell

Using the byte size of the image for comparison would be suitable for many applications. Another way would be to:

Strip out the metadata.
Calculate the MD5 (or other suitable hashing algorithm) for the image.
Compare that to the MD5 (or whatever) of the potential dupe image (provided you've stripped out the metadata for that one too)

answered Oct 22 '22 19:10

karim79

Related questions
                            
                                How accurate is Thread.Sleep(TimeSpan)?
                            
                                Finding current directory during Visual Studio debugging session?
                            
                                Get str repr with double quotes Python
                            
                                Scala and HTML parsing
                            
                                From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values
                            
                                Vim settings for Erlang
                            
                                Red-Green light indicators in C# .NET Form
                            
                                Is there a simple way of obtaining all object instances of a specific class in Java
                            
                                Xpath for choosing next sibling
                            
                                Using XQuery/XPath to get the attribute value of an element's parent node
                            
                                Merging PDFs with ITextSharp
                            
                                C++: Compiler warning for large unsigned int

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With