Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast and efficient way to detect if two images are visually identical in Python

Given two images:

image1.jpg
image2.jpg

What's a fast way to detect if they are visually identical in Python? For example, they may have different EXIF data which would yield different checksums, even though the image data is the same).

Imagemagick has an excellent tool, "identify," that produces a visual hash of an image, but it's very processor intensive.

like image 336
ensnare Avatar asked Jun 01 '14 18:06

ensnare


2 Answers

Using PIL/Pillow:

from PIL import Image

im1 = Image.open('image1.jpg')
im2 = Image.open('image2.jpg')

if list(im1.getdata()) == list(im2.getdata()):
    print "Identical"
else:
    print "Different"
like image 171
RodrigoOlmo Avatar answered Oct 11 '22 01:10

RodrigoOlmo


I'm still submitting my way to tackle this -- even if the OP says that ImageMagick's way is too processor intensive (and even though my way does not involve Python)... Maybe my answer is useful to other people then, arriving at this page via search engine.

Be aware that any image comparison which is supposed to discover fine differences in hi-res images is more processor intensive than a discovery of big differences in low-res images, as it has to compare a lot more pixels.

Visualization of Differences

Here is an ImageMagick command that compares two (same-sized!) images, and returns all differing pixels as red, identical pixels as white. The first one has the reference image as a faded out background image for the composition of the red-white pixel matrix. .img may be any of the IM-supported formats (.png, .PnG, .pNG, .PNG, .jpg, .jpeg, .jPeG, .tif, .tiff, .ppm, .gif, .pdf, ...):

 compare reference.img similar.img  delta.img
 compare reference.img similar.img  -compose src delta.img

By default, the comparison is made at 72 PPI. If you need more resolution (like, with a vector based image, such as a PDF page), you can add -density to increase it. Of course, the processing time will increase accordingly:

 compare -density 300 reference.img similar.img delta.img

If you add a fuzz factor, you can tell ImageMagick to treat all pixels as identical which are no more than a certain color distance apart:

 compare -fuzz '3%' reference.img similar.img -compose src delta.img

pHash-ed difference value

More recent versions of ImageMagick support the phash algorithm:

 compare -metric phash reference.img similar.img -compose src delta.img

This will, besides creating the delta.img for visualization, return a numeric value that indicates the "difference" between two images. The closer it is to 0, the more similar are the two images compared.

Examples:

Create a few small PDF pages with minor differences in them. I'm using Ghostscript:

gs -o ref1.pdf -sDEVICE=pdfwrite -g1050x1350 \
 -c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (0) show showpage"

gs -o ref2.pdf -sDEVICE=pdfwrite -g1050x1350 
 -c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (0) show showpage"

gs -o ref3.pdf -sDEVICE=pdfwrite -g1050x1350 \
 -c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (O) show showpage"

gs -o ref4.pdf -sDEVICE=pdfwrite -g1050x1350 \
 -c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (O) show showpage"

Now compare ref1.pdf with ref3.pdf at the default resolution of 72 PPI:

compare -metric phash ref1.pdf ref3.pdf delta-ref1-ref3.pdf
  7.61662

The returned pHash value is 7.61662. This indicates that ImageMagick's compare discovered at least some differences.

Let's look at the visualization. I'll create a side-by-side visualization of the three PDFs/images (to be shown below):

convert                                    \
   -mattecolor blue                        \
      \( ref1.pdf -frame 2x2 \)            \
    null:                                  \
      \( ref3.pdf -frame 2x2 \)            \
    null:                                  \
      \( delta-ref1-ref3.pdf -frame 2x2 \) \
   +append                                 \
    ref1-ref3-delta.png 

Visualization of differences: <code>ref1.pdf</code> (right), <code>ref3.pdf</code> (center) and <code>ref1-ref3-delta.png</code> (right)

As you can see, the different shapes of the 0 (digit 'zero') and the O (letter o, capital version) are standing out quite well.

Now the next one: where ref1.pdf is compared to ref2.pdf, also at 72 PPI.

compare -metric phash ref1.pdf ref2.pdf delta-ref1-ref2.pdf
  0

The returned pHash value now is 0. This indicates that ImageMagick discovered no difference!

Create a side-by-side visualization of the three PDFs/images:

convert                                    \
   -mattecolor blue                        \
      \( ref1.pdf -frame 2x2 \)            \
    null:                                  \
      \( ref2.pdf -frame 2x2 \)            \
    null:                                  \
      \( delta-ref1-ref2.pdf -frame 2x2 \) \
   +append                                 \
    ref1-ref2-delta.png 

Visualization of differences: <code>ref1.pdf</code> (right), <code>ref2.pdf</code> (center) and <code>ref1-ref2-delta.png</code> (right)

As you can see, at 72 PPI ImageMagick does not discover a difference between the two PDFs (as would be indicated by red pixels). According to the Ghostscript command, both show the digit 0, but at positions which are shifted by 0.1 pt apart in x- and y-directions. So in reality, in the original PDF, there IS a difference. But when rendered at 72 PPI, this difference isn't visible.

Let's try to see the difference with density 600 then:

compare        \
 -metric phash \
 -density 600  \
  ref1.pdf     \
  ref2.pdf     \
  ref1-ref2-at-density600-delta.png 

0.00172769

The returned pHash value at 600 PPI now is 0.00172769. This is close to zero, but still a difference. The difference is less than the one between ref1.pdf and ref3.pdf.

The difference is clearly highlighted now in the visual comparison, even though only by a thin line of red pixels:

like image 28
Kurt Pfeifle Avatar answered Oct 11 '22 00:10

Kurt Pfeifle