Is there a quick and easy way to do such comparison?
I've found few image compare questions from stackoverflow but none of those actually proved answer for this question.
I have images files in my filesystem and a script that fetches images from urls. I want to check if the image in url is already the same that is on disk. Normally I would load the image in disk and url to a PIL object and use following function I found:
def equal(im1, im2):
return ImageChops.difference(im1, im2).getbbox() is None
but this doesn't work if you have a image saved to disk with PIL as it gets compressed even if you turn the quality to 100 im1.save(outfile,quality=100)
.
My code is currently following: http://pastebin.com/295kDMsp but the image always ends up re-saved.
Simple dummy method: resize the largest image to match the size of the smallest image and compare. To compare two images i and j , resize the largest of them to the dimensions of the other one using 3-lobed lanczos, which is conveniently available in PIL by doing img1. resize(img2. size, Image.
The question's title suggests you have two exact images to compare, and that is trivially done. Now, if you have similar images to compare then that explains why you didn't find a fully satisfactory answer: there is no metric applicable to every problem that gives the expected results (note that expected results varies between applications). One of the problems is that it is hard -- in the sense that there is no common agreement -- to compare images with multiple bands, like color images. To handle that, I will consider the application of a given metric in each band, and the result of that metric will be the lowest resulting value. This assumes the metric has a well established range, like [0, 1], and the maximum value in this range means the images are identical (by the given metric). Conversely, the minimum value means the images are totally different.
So, all I will do here is give you two metrics. One of them is SSIM and the other one I will call as NRMSE (a normalization of the root of the mean squared error). I choose to present the second one because it is a very simple method, and it may be enough for your problem.
Let us get started with examples. The images are in this order: f = original image in PNG, g1 = JPEG at 50% quality of f
(made with convert f -quality 50 g
), g2 = JPEG 1% quality of f
, h = "lightened" g2.
Results (rounded):
In a way, both metrics handled well the modifications but SSIM
showed to be a more sensible by reporting lower similarities when images were in fact visually distinct, and by reporting a higher value when the images were visually very similar. The next example considers a color image (f = original image, and g = JPEG at 5% quality).
So, it is up to you to determine what is the metric you prefer and a threshold value for it.
Now, the metrics. What I denominated as NRMSE is simply 1 - [RMSE / (maxval
- minval
)]. Where maxval
is the maximum intensity from the two images being compared, and respectively the same for minval
. RMSE is given by the square root of MSE: sqrt[(sum(A - B) ** 2) / |A|], where |A| means the number of elements in A. By doing this, the maximum value given by RMSE is maxval
. If you want to further understand the meaning of MSE in images, see, for example, https://ece.uwaterloo.ca/~z70wang/publications/SPM09.pdf. The metric SSIM (Structural SIMilarity) is more involved, and you can find details in the earlier included link. To easily apply the metrics, consider the following code:
import numpy
from scipy.signal import fftconvolve
def ssim(im1, im2, window, k=(0.01, 0.03), l=255):
"""See https://ece.uwaterloo.ca/~z70wang/research/ssim/"""
# Check if the window is smaller than the images.
for a, b in zip(window.shape, im1.shape):
if a > b:
return None, None
# Values in k must be positive according to the base implementation.
for ki in k:
if ki < 0:
return None, None
c1 = (k[0] * l) ** 2
c2 = (k[1] * l) ** 2
window = window/numpy.sum(window)
mu1 = fftconvolve(im1, window, mode='valid')
mu2 = fftconvolve(im2, window, mode='valid')
mu1_sq = mu1 * mu1
mu2_sq = mu2 * mu2
mu1_mu2 = mu1 * mu2
sigma1_sq = fftconvolve(im1 * im1, window, mode='valid') - mu1_sq
sigma2_sq = fftconvolve(im2 * im2, window, mode='valid') - mu2_sq
sigma12 = fftconvolve(im1 * im2, window, mode='valid') - mu1_mu2
if c1 > 0 and c2 > 0:
num = (2 * mu1_mu2 + c1) * (2 * sigma12 + c2)
den = (mu1_sq + mu2_sq + c1) * (sigma1_sq + sigma2_sq + c2)
ssim_map = num / den
else:
num1 = 2 * mu1_mu2 + c1
num2 = 2 * sigma12 + c2
den1 = mu1_sq + mu2_sq + c1
den2 = sigma1_sq + sigma2_sq + c2
ssim_map = numpy.ones(numpy.shape(mu1))
index = (den1 * den2) > 0
ssim_map[index] = (num1[index] * num2[index]) / (den1[index] * den2[index])
index = (den1 != 0) & (den2 == 0)
ssim_map[index] = num1[index] / den1[index]
mssim = ssim_map.mean()
return mssim, ssim_map
def nrmse(im1, im2):
a, b = im1.shape
rmse = numpy.sqrt(numpy.sum((im2 - im1) ** 2) / float(a * b))
max_val = max(numpy.max(im1), numpy.max(im2))
min_val = min(numpy.min(im1), numpy.min(im2))
return 1 - (rmse / (max_val - min_val))
if __name__ == "__main__":
import sys
from scipy.signal import gaussian
from PIL import Image
img1 = Image.open(sys.argv[1])
img2 = Image.open(sys.argv[2])
if img1.size != img2.size:
print "Error: images size differ"
raise SystemExit
# Create a 2d gaussian for the window parameter
win = numpy.array([gaussian(11, 1.5)])
win2d = win * (win.T)
num_metrics = 2
sim_index = [2 for _ in xrange(num_metrics)]
for band1, band2 in zip(img1.split(), img2.split()):
b1 = numpy.asarray(band1, dtype=numpy.double)
b2 = numpy.asarray(band2, dtype=numpy.double)
# SSIM
res, smap = ssim(b1, b2, win2d)
m = [res, nrmse(b1, b2)]
for i in xrange(num_metrics):
sim_index[i] = min(m[i], sim_index[i])
print "Result:", sim_index
Note that ssim
refuses to compare images when the given window
is larger than them. The window
is typically very small, default is 11x11, so if your images are smaller than that, there is no much "structure" (from the name of the metric) to compare and you should use something else (like the other function nrmse
). Probably there is a better way to implement ssim
, since in Matlab this run much faster.
You can make your own comparison - using square difference. You will then set up a threshold, like 95% and if they are that similar, then you don't have to download it. It eliminates the problem of compression
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With