I have a sample image which contains an object, such as the earrings in the following image:
I then have a large candidate set of images for which I need to determine which one most likely contains the object, e.g.:
So I need to produce a score for each image, where the highest score corresponds to the image which most likely contains the target object. Now, in this case, I have the following conditions/constraints to work with/around:
1) I can obtain multiple sample images at different angles.
2) The sample images are likely to be at different resolutions, angles, and distances than the candidate images.
3) There are a LOT of candidate images (> 10,000), so it must be reasonably fast.
4) I'm willing to sacrifice some precision for speed, so if it means we have to search through the top 100 instead of just the top 10, that's fine and can be done manually.
5) I can manipulate the sample images manually, such as outlining the object that I wish to detect; the candidate images cannot be manipulated manually as there are too many.
6) I have no real background in OpenCV or computer vision at all, so I'm starting from scratch here.
My initial thought is to start by drawing a rough outline around the object in the sample image. Then, I could identify corners in the object and corners in the candidate image. I could profile the pixels around each corner to see if they look similar and then rank by the sum of the maximum similarity scores of every corner. I'm also not sure how to quantify similar pixels. I guess just the Euclidean distance of their RGB values?
The problem there is that it kind of ignores the center of the object. In the above examples, if the corners of the earrings are all near the gold frame, then it would not consider the red, green, and blue stones inside the earring. I suppose I could improve this by then looking at all pairs of corners and determining similarity by sampling some points along the line between them.
So I have a few questions:
A) Does this line of thinking make sense in general or is there something I'm missing?
B) Which specific algorithms from OpenCV should I investigate using? I'm aware that there are multiple corner detection algorithms, but I only need one and if the differences are all optimizing on the margins then I'm fine with the fastest.
C) Any example code using the algorithms that would be helpful to aid in my understanding?
My options for languages are either Python or C#.
OpenCV has a bunch of pre-trained classifiers that can be used to identify objects such as trees, number plates, faces, eyes, etc. We can use any of these classifiers to detect the object as per our need.
Read the input images using cv2. imread() and convert it to grayscale. The height, width and number of channels of the images must be the same. Define a function to compute the Mean Squared Error between two images.
read() in OpenCV returns 2 things, boolean and data. If there are not 2 variables, a tuple will be assigned to one variable. The boolean is mostly used for error catching.
Fortunately, the kind guys from OpenCV just did that for you. Check in your samples folder "opencv\samples\cpp\matching_to_many_images.cpp". Compile and give it a try wih the default images.
The algorithm can be easily adapted to make it faster or more precise.
Mainly, object recognition algorithms are split in two parts: keypoint detection& description adn object matching. For both of them there are many algorithms/variants, with wich you can play directly into OpenCV.
Detection/description can be done by: SIFT/SURF/ORB/GFTT/STAR/FAST and others.
For matching you have: brute force, hamming, etc. (Some methods are specific for a given detection algorithm)
HINTS to start:
crop your original image so the interesting object covers as much as possible of the image area. Use it as training.
SIFT is the most accurate and the laziest descriptor. FAST is a good combination of precision and accuracy. GFTT is old and quite unreliable. ORB is newly added to OPENCV and is very promising, both in speed and accuracy.
So, you can find the best combination for you by trial and error.
For the details of every implementation, you should read the original papers/tutorials. google scholar is a good start
Check out the SURF features, which are a part of openCV. The idea here is that you have an algorithm for finding "interest points" in two images. You also have an algorithm for computing a descriptor of an image patch around each interest point. Typically this descriptor captures the distribution of edge orientations in the patch. Then you try to find point correspondences, i. e. for each interest point in image A try to find a corresponding interest point in image B. This is accomplished by comparing the descriptors, and looking for the closest matches. Then, if you have a set of correspondences that are related by some geometric transformation, you have a detection.
Of course, this is a very high level explanation. The devil is in the details, and for those you should read some papers. Start with Distinctive image features from scale-invariant keypoints by David Lowe, and then read the papers on SURF.
Also, consider moving this question to Signal and Image Processing Stack Exchange
In case someone comes along in the future, here's a small sample doing this with openCV. It's based on the opencv sample, but (in my opinion), this is a bit clearer, so I'm including it as well.
Tested with openCV 2.4.4
#!/usr/bin/env python
Uses SURF to match two images.
Finds common features between two images and draws them
Based on the sample code from opencv:
find_obj.py <image1> <image2>
import sys
import numpy
import cv2
# Image Matching
def match_images(img1, img2, img1_features=None, img2_features=None):
"""Given two images, returns the matches"""
detector = cv2.SURF(3200)
matcher = cv2.BFMatcher(cv2.NORM_L2)
if img1_features is None:
kp1, desc1 = detector.detectAndCompute(img1, None)
kp1, desc1 = img1_features
if img2_features is None:
kp2, desc2 = detector.detectAndCompute(img2, None)
kp2, desc2 = img2_features
#print 'img1 - %d features, img2 - %d features' % (len(kp1), len(kp2))
raw_matches = matcher.knnMatch(desc1, trainDescriptors=desc2, k=2)
kp_pairs = filter_matches(kp1, kp2, raw_matches)
return kp_pairs
def filter_matches(kp1, kp2, matches, ratio=0.75):
"""Filters features that are common to both images"""
mkp1, mkp2 = [], []
for m in matches:
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
m = m[0]
kp_pairs = zip(mkp1, mkp2)
return kp_pairs
# Match Diplaying
def draw_matches(window_name, kp_pairs, img1, img2):
"""Draws the matches"""
mkp1, mkp2 = zip(*kp_pairs)
H = None
status = None
if len(kp_pairs) >= 4:
p1 = numpy.float32([kp.pt for kp in mkp1])
p2 = numpy.float32([kp.pt for kp in mkp2])
H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
if len(kp_pairs):
explore_match(window_name, img1, img2, kp_pairs, status, H)
def explore_match(win, img1, img2, kp_pairs, status=None, H=None):
"""Draws lines between the matched features"""
h1, w1 = img1.shape[:2]
h2, w2 = img2.shape[:2]
vis = numpy.zeros((max(h1, h2), w1 + w2), numpy.uint8)
vis[:h1, :w1] = img1
vis[:h2, w1:w1 + w2] = img2
vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)
if H is not None:
corners = numpy.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]])
reshaped = cv2.perspectiveTransform(corners.reshape(1, -1, 2), H)
reshaped = reshaped.reshape(-1, 2)
corners = numpy.int32(reshaped + (w1, 0))
cv2.polylines(vis, [corners], True, (255, 255, 255))
if status is None:
status = numpy.ones(len(kp_pairs), numpy.bool_)
p1 = numpy.int32([kpp[0].pt for kpp in kp_pairs])
p2 = numpy.int32([kpp[1].pt for kpp in kp_pairs]) + (w1, 0)
green = (0, 255, 0)
red = (0, 0, 255)
for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
if inlier:
col = green
cv2.circle(vis, (x1, y1), 2, col, -1)
cv2.circle(vis, (x2, y2), 2, col, -1)
col = red
r = 2
thickness = 3
cv2.line(vis, (x1 - r, y1 - r), (x1 + r, y1 + r), col, thickness)
cv2.line(vis, (x1 - r, y1 + r), (x1 + r, y1 - r), col, thickness)
cv2.line(vis, (x2 - r, y2 - r), (x2 + r, y2 + r), col, thickness)
cv2.line(vis, (x2 - r, y2 + r), (x2 + r, y2 - r), col, thickness)
vis0 = vis.copy()
for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
if inlier:
cv2.line(vis, (x1, y1), (x2, y2), green)
cv2.imshow(win, vis)
# Test Main
if __name__ == '__main__':
if len(sys.argv) < 3:
print "No filenames specified"
print "USAGE: find_obj.py <image1> <image2>"
fn1 = sys.argv[1]
fn2 = sys.argv[2]
img1 = cv2.imread(fn1, 0)
img2 = cv2.imread(fn2, 0)
if img1 is None:
print 'Failed to load fn1:', fn1
if img2 is None:
print 'Failed to load fn2:', fn2
kp_pairs = match_images(img1, img2)
if kp_pairs:
draw_matches('find_obj', kp_pairs, img1, img2)
print "No matches found"
