What is the fastest way to calculate sum of absolute differences between two images in Python?

Question

I am trying to compare images in a Python 3 application that uses Pillow and, optionally, Numpy. For compatibility reasons, I don't intend to use other external non pure-Python packages. I found this Pillow based algorithm in Roseta Code and it may serve my purpose, but it takes some time:

from PIL import Image

def compare_images(img1, img2):
    """Compute percentage of difference between 2 JPEG images of same size
    (using the sum of absolute differences). Alternatively, compare two bitmaps
    as defined in basic bitmap storage. Useful for comparing two JPEG images
    saved with a different compression ratios.

    Adapted from:
    http://rosettacode.org/wiki/Percentage_difference_between_images#Python

    :param img1: an Image object
    :param img2: an Image object
    :return: A float with the percentage of difference, or None if images are
    not directly comparable.
    """

    # Don't compare if images are of different modes or different sizes.
    if (img1.mode != img2.mode) \
            or (img1.size != img2.size) \
            or (img1.getbands() != img2.getbands()):
        return None

    pairs = zip(img1.getdata(), img2.getdata())
    if len(img1.getbands()) == 1:
        # for gray-scale jpegs
        dif = sum(abs(p1 - p2) for p1, p2 in pairs)
    else:
        dif = sum(abs(c1 - c2) for p1, p2 in pairs for c1, c2 in zip(p1, p2))

    ncomponents = img1.size[0] * img1.size[1] * 3
    return (dif / 255.0 * 100) / ncomponents  # Difference (percentage)

Trying to find alternatives, I discovered that this function could be rewritten using Numpy:

import numpy as np    
from PIL import Image

def compare_images_np(img1, img2):
    if (img1.mode != img2.mode) \
            or (img1.size != img2.size) \
            or (img1.getbands() != img2.getbands()):
        return None

    dif = 0
    for band_index, band in enumerate(img1.getbands()):
        m1 = np.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
        m2 = np.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
        dif += np.sum(np.abs(m1-m2))

    ncomponents = img1.size[0] * img1.size[1] * 3
    return (dif / 255.0 * 100) / ncomponents  # Difference (percentage)

I was expecting an improvement in processing speed, but actually it takes a little longer. I have no experience with Numpy, beyond the basics, so I wonder if there is any way to make it faster, for instance using some algorithm that does not imply that for loop. Any ideas?

Mark Setchell · Accepted Answer

I think I understand what you are trying to do. I have no idea of the relative performance of our two machines so maybe you can benchmark it yourself.

from PIL import Image
import numpy as np

# Load images, convert to RGB, then to numpy arrays and ravel into long, flat things
a=np.array(Image.open('a.png').convert('RGB')).ravel()
b=np.array(Image.open('b.png').convert('RGB')).ravel()

# Calculate the sum of the absolute differences divided by number of elements
MAE = np.sum(np.abs(np.subtract(a,b,dtype=np.float))) / a.shape[0]

The only "tricky" thing in there is the forcing of the result type of np.subtract() to a float which ensures I can store negative numbers. It may be worth trying with dtype=np.int16 on your hardware to see if that is faster.

A fast way to benchmark it is as follows. Start ipython and then type in the following:

from PIL import Image
import numpy as np

a=np.array(Image.open('a.png').convert('RGB')).ravel()
b=np.array(Image.open('b.png').convert('RGB')).ravel()

Now you can time my code with:

%timeit np.sum(np.abs(np.subtract(a,b,dtype=np.float))) / a.shape[0]
6.72 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Or, you can try an int16 version like this:

%timeit np.sum(np.abs(np.subtract(a,b,dtype=np.int16))) / a.shape[0]
6.43 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

If you want to time your code, paste in your function then use:

%timeit compare_images_pil(img1, img2)

What is the fastest way to calculate sum of absolute differences between two images in Python?

Tags:

python

arrays

image-processing

numpy

python-imaging-library

Victor Domingos

1 Answers

Mark Setchell

Recent Activity

Donate For Us

What is the fastest way to calculate sum of absolute differences between two images in Python?

Tags:

python

arrays

image-processing

numpy

python-imaging-library

Victor Domingos

1 Answers

Mark Setchell

Related questions

Recent Activity

Donate For Us