The similarity of the two images is detected using the package “imagehash”. If two images are identical or almost identical, the imagehash difference will be 0. Two images are more similar if the imagehash difference is closer to 0.
This is a huge topic, with answers from 3 lines of code to entire research magazines.
I will outline the most common such techniques and their results.
One of the simplest & fastest methods. Proposed decades ago as a means to find picture simmilarities. The idea is that a forest will have a lot of green, and a human face a lot of pink, or whatever. So, if you compare two pictures with forests, you'll get some simmilarity between histograms, because you have a lot of green in both.
Downside: it is too simplistic. A banana and a beach will look the same, as both are yellow.
OpenCV method: compareHist()
A good example here matchTemplate finding good match. It convolves the search image with the one being search into. It is usually used to find smaller image parts in a bigger one.
Downsides: It only returns good results with identical images, same size & orientation.
OpenCV method: matchTemplate()
Considered one of the most efficient ways to do image search. A number of features are extracted from an image, in a way that guarantees the same features will be recognized again even when rotated, scaled or skewed. The features extracted this way can be matched against other image feature sets. Another image that has a high proportion of the features matching the first one is considered to be depicting the same scene.
Finding the homography between the two sets of points will allow you to also find the relative difference in shooting angle between the original pictures or the amount of overlapping.
There are a number of OpenCV tutorials/samples on this, and a nice video here. A whole OpenCV module (features2d) is dedicated to it.
Downsides: It may be slow. It is not perfect.
Over on the OpenCV Q&A site I am talking about the difference between feature descriptors, which are great when comparing whole images and texture descriptors, which are used to identify objects like human faces or cars in an image.
If for matching identical images ( same size/orientation )
// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
// Calculate the L2 relative error between images.
double errorL2 = norm( A, B, CV_L2 );
// Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
double similarity = errorL2 / (double)( A.rows * A.cols );
return similarity;
}
else {
//Images have a different size
return 100000000.0; // Return a bad value
}
Source
Sam's solution should be sufficient. I've used combination of both histogram difference and template matching because not one method was working for me 100% of the times. I've given less importance to histogram method though. Here's how I've implemented in simple python script.
import cv2
class CompareImage(object):
def __init__(self, image_1_path, image_2_path):
self.minimum_commutative_image_diff = 1
self.image_1_path = image_1_path
self.image_2_path = image_2_path
def compare_image(self):
image_1 = cv2.imread(self.image_1_path, 0)
image_2 = cv2.imread(self.image_2_path, 0)
commutative_image_diff = self.get_image_difference(image_1, image_2)
if commutative_image_diff < self.minimum_commutative_image_diff:
print "Matched"
return commutative_image_diff
return 10000 //random failure value
@staticmethod
def get_image_difference(image_1, image_2):
first_image_hist = cv2.calcHist([image_1], [0], None, [256], [0, 256])
second_image_hist = cv2.calcHist([image_2], [0], None, [256], [0, 256])
img_hist_diff = cv2.compareHist(first_image_hist, second_image_hist, cv2.HISTCMP_BHATTACHARYYA)
img_template_probability_match = cv2.matchTemplate(first_image_hist, second_image_hist, cv2.TM_CCOEFF_NORMED)[0][0]
img_template_diff = 1 - img_template_probability_match
# taking only 10% of histogram diff, since it's less accurate than template method
commutative_image_diff = (img_hist_diff / 10) + img_template_diff
return commutative_image_diff
if __name__ == '__main__':
compare_image = CompareImage('image1/path', 'image2/path')
image_difference = compare_image.compare_image()
print image_difference
A little bit off topic but useful is the pythonic numpy
approach. Its robust and fast but just does compare pixels and not the objects or data the picture contains (and it requires images of same size and shape):
A very simple and fast approach to do this without openCV and any library for computer vision is to norm the picture arrays by
import numpy as np
picture1 = np.random.rand(100,100)
picture2 = np.random.rand(100,100)
picture1_norm = picture1/np.sqrt(np.sum(picture1**2))
picture2_norm = picture2/np.sqrt(np.sum(picture2**2))
After defining both normed pictures (or matrices) you can just sum over the multiplication of the pictures you like to compare:
1) If you compare similar pictures the sum will return 1:
In[1]: np.sum(picture1_norm**2)
Out[1]: 1.0
2) If they aren't similar, you'll get a value between 0 and 1 (a percentage if you multiply by 100):
In[2]: np.sum(picture2_norm*picture1_norm)
Out[2]: 0.75389941124629822
Please notice that if you have colored pictures you have to do this in all 3 dimensions or just compare a greyscaled version. I often have to compare huge amounts of pictures with arbitrary content and that's a really fast way to do so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With