Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare similarity of images using OpenCV with Python

I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.

I get this code in this post and change for my context

# Load the images img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")  # Convert them to grayscale imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)  # SURF extraction surf = cv2.FeatureDetector_create("SURF") surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF") kp = surf.detect(imgg) kp, descritors = surfDescriptorExtractor.compute(imgg,kp)  # Setting up samples and responses for kNN samples = np.array(descritors) responses = np.arange(len(kp),dtype = np.float32)  # kNN training knn = cv2.KNearest() knn.train(samples,responses)  modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]  for modelImage in modelImages:      # Now loading a template image and searching for similar keypoints     template = cv2.imread(modelImage)     templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)     keys = surf.detect(templateg)      keys,desc = surfDescriptorExtractor.compute(templateg, keys)      for h,des in enumerate(desc):         des = np.array(des,np.float32).reshape((1,128))          retval, results, neigh_resp, dists = knn.find_nearest(des,1)         res,dist =  int(results[0][0]),dists[0][0]           if dist<0.1: # draw matched keypoints in red color             color = (0,0,255)          else:  # draw unmatched in blue color             #print dist             color = (255,0,0)          #Draw matched key points on original image         x,y = kp[res].pt         center = (int(x),int(y))         cv2.circle(img,center,2,color,-1)          #Draw matched key points on template image         x,y = keys[h].pt         center = (int(x),int(y))         cv2.circle(template,center,2,color,-1)        cv2.imshow('img',img)     cv2.imshow('tm',template)     cv2.waitKey(0)     cv2.destroyAllWindows() 

My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?

like image 272
leeeandroo Avatar asked Nov 14 '12 13:11

leeeandroo


People also ask

How does Python compare images in OpenCV?

Use the norm() Function of OpenCV to Compare Images.

How can we compare similarity between two images?

Image Similarity The similarity of the two images is detected using the package “imagehash”. If two images are identical or almost identical, the imagehash difference will be 0. Two images are more similar if the imagehash difference is closer to 0.

What is the difference between Python and OpenCV?

Python is a high-level programming language, whereas OpenCV is a library for computer vision. Python is used to write code, implement algorithms, develop systems, etc. Anything that can be computed can be implemented in python.


1 Answers

I suggest you to take a look to the earth mover's distance (EMD) between the images. This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:

robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf

It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.

The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.

I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.

EDIT:

Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).

In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.

This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.

Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).

It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):

#original data, two 2x2 images, normalized x = rand(2,2) x/=sum(x) y = rand(2,2) y/=sum(y)  #initial guess of the flux matrix # just the product of the image x as row for the image y as column #This is a working flux, but is not an optimal one F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()  #distance matrix, based on euclidean distance row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1])) row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1])) rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2) cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2) D = np.sqrt(rows+cols)  D = D.flatten() x = x.flatten() y = y.flatten() #COST=sum(F*D)  #cost function fun = lambda F: sum(F*D) jac = lambda F: D #array of constraint #the constraint of sum one is implicit given the later constraints cons  = [] #each row and columns should sum to the value of the start and destination array cons += [ {'type': 'eq', 'fun': lambda F:  sum(F.reshape((x.size,y.size))[i,:])-x[i]}     for i in range(x.size) ] cons += [ {'type': 'eq', 'fun': lambda F:  sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ] #the values of F should be positive bnds = (0, None)*F.size  from scipy.optimize import minimize res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons) 

the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.

The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results

like image 154
EnricoGiampieri Avatar answered Oct 04 '22 10:10

EnricoGiampieri