Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Image fingerprint to compare similarity of many images

I need to create fingerprints of many images (about 100.000 existing, 1000 new per day, RGB, JPEG, max size 800x800) to compare every image to every other image very fast. I can't use binary compare methods because also images which are nearly similar should be recognized.

Best would be an existing library, but also some hints to existing algorithms would help me a lot.

like image 975
Philip Dreyer Avatar asked Feb 27 '09 19:02

Philip Dreyer


People also ask

How do you compare two images for similarity?

The similarity of the two images is detected using the package “imagehash”. If two images are identical or almost identical, the imagehash difference will be 0. Two images are more similar if the imagehash difference is closer to 0.

What is image fingerprinting?

Abstract: Image fingerprinting is a technique that summarizes the perceptual characteristics of a digital image into an invariant digest, and it is one of the most effective solutions for digital rights management.

Which one is a fingerprint matching technique?

The two main categories of fingerprint matching techniques are minutiae-based matching and pattern matching. Pattern matching simply compares two images to see how similar they are.

WHAT IS fingerprint matching algorithm?

A fingerprint matching algorithm compares two given fingerprints and returns either a degree of similarity (without loss of generality, a score between 0 and 1) or a binary decision (mated/non-mated).


1 Answers

Normal hashing or CRC calculation algorithms do not work well with image data. The dimensional nature of the information must be taken into account.

If you need extremely robust fingerprinting, such that affine transformations (scaling, rotation, translation, flipping) are accounted for, you can use a Radon transformation on the image source to produce a normative mapping of the image data - store this with each image and then compare just the fingerprints. This is a complex algorithm and not for the faint of heart.

a few simple solutions are possible:

  1. Create a luminosity histogram for the image as a fingerprint
  2. Create scaled down versions of each image as a fingerprint
  3. Combine technique (1) and (2) into a hybrid approach for improved comparison quality

A luminosity histogram (especially one that is separated into RGB components) is a reasonable fingerprint for an image - and can be implemented quite efficiently. Subtracting one histogram from another will produce a new historgram which you can process to decide how similar two images are. Histograms, because the only evaluate the distribution and occurrence of luminosity/color information handle affine transformations quite well. If you quantize each color component's luminosity information down to an 8-bit value, 768 bytes of storage are sufficient for the fingerprint of an image of almost any reasonable size. Luminosity histograms produce false negatives when the color information in an image is manipulated. If you apply transformations like contrast/brightness, posterize, color shifting, luminosity information changes. False positives are also possible with certain types of images ... such as landscapes and images where a single color dominates others.

Using scaled images is another way to reduce the information density of the image to a level that is easier to compare. Reductions below 10% of the original image size generally lose too much of the information to be of use - so an 800x800 pixel image can be scaled down to 80x80 and still provide enough information to perform decent fingerprinting. Unlike histogram data, you have to perform anisotropic scaling of the image data when the source resolutions have varying aspect ratios. In other words, reducing a 300x800 image into an 80x80 thumbnail causes deformation of the image, such that when compared with a 300x500 image (that's very similar) will cause false negatives. Thumbnail fingerprints also often produce false negatives when affine transformations are involved. If you flip or rotate an image, its thumbnail will be quite different from the original and may result in a false positive.

Combining both techniques is a reasonable way to hedge your bets and reduce the occurence of both false positives and false negatives.

like image 108
LBushkin Avatar answered Dec 05 '22 10:12

LBushkin