I'm attempting to wrap my head around the basics of CV. The bit that initially got me interested was template matching (it was mentioned in a Pycon talk unrelated to CV), so I figured I'd start there.
I started with this image:
Out of which I want to detect Mario. So I cut him out:
I understand the concept of sliding the template around the image to see the best fit, and following a tutorial, I'm able to find mario with the following code:
def match_template(img, template):
s = time.time()
img_size = cv.GetSize(img)
template_size = cv.GetSize(template)
img_result = cv.CreateImage((img_size[0] - template_size[0] + 1,
img_size[1] - template_size[1] + 1), cv.IPL_DEPTH_32F, 1)
cv.Zero(img_result)
cv.MatchTemplate(img, template, img_result, cv.CV_TM_CCORR_NORMED)
min_val, max_val, min_loc, max_loc = cv.MinMaxLoc(img_result)
# inspect.getargspec(cv.MinMaxLoc)
print min_val
print max_val
print min_loc
print max_loc
cv.Rectangle(img, max_loc, (max_loc[0] + template.width, max_loc[1] + template.height), cv.Scalar(120.), 2)
print time.time() - s
cv.NamedWindow("Result")
cv.ShowImage("Result", img)
cv.WaitKey(0)
cv.DestroyAllWindows()
So far so good, but then I came to realize that this is incredibly fragile. It will only ever find Mario with that specific background, and with that specific animation frame being displayed.
So I'm curious, given that Mario will always have the same Mario-ish attributes, (size, colors) is there a technique with which I could find him regardless of whether his currect frame is standing still, or one of the various run cycle sprites? Kind of like fuzzy matching that you can do on strings, but for images.
Maybe since he's the only red thing, there is a way of simply tracking the red pixels?
The whole other issue is removing the background from the template. Maybe that would help the MatchTemplate function find Mario even though he doesn't exactly match the tempate? As of now, I'm not entirely sure how that would work ( I see that there is a mask param in MatchTemplate, but I'll have to investigate further)
My main question is whether or not template matching is the way to go about detecting an image that is mostly the same, but varies (like when he's walking), or is there another technique I should look into?
Going off of mmgp's suggestion that it should be workable for matching other things, I ran a couple of tests.
I used this as the template to match:
And then took a couple of screen shots to test the matching against.
For the first, I successfully find Mario, and get a max value of 1.
However, trying to find jumping Mario results in a complete misfire.
Now granted, the mario in the template, and the mario in the scene is facing opposite directions, as well as being different animation frames, but I would think they still match a lot more than anything else in the image -- if only for the colors alone. But it targets the platform as being the closest match to the template.
Note that the max value for this one was 0.728053808212
.
Next I tried a scene without mario to see what would happen.
But oddly enough, I get the exact result as the image with jumping mario -- right down to the similarity value: 0.728053808212
. Mario being in the picture is just as accurate as him not being in the picture.
Really strange! I don't know the actual details of the underlying algorithm, but I'd imagine, from a standard deviation perspective, the boxes in the scene that at least match the Red in template Mario's suit would be closer to the mean distance than a blue platform, right? So, it's extra confusing that it's not even in the general area of where I would expect it to be.
I'm guessing this is user error on my end, or maybe just a misunderstanding.
Why would a scene with a similar Mario have as much of a match as a scene with no Mario at all?
No method is infallible, but template matching do have a good chance to work there. It might require some pre-processing, and until there is a larger sample (a short video for example) to demonstrate the possible problems, there isn't much point in trying more advanced methods simply because some library implement them for you -- especially if you don't know under which conditions they are expected to work.
For instance, here are the results I get using template matching (red rectangles) -- all them are using the template http://i.stack.imgur.com/EYs9B.png, even the last one:
To achieve that I started by considering only the red channel of both the template and the input image. From that we easily calculate the internal morphological gradient and only then perform the matching. In order to not get a rectangle when Mario is not present, it is needed to set a minimum threshold for the matching. Here is the template and one of the images after these two transformations:
And here is some sample code to achieve that:
import sys
import cv2
import numpy
img = cv2.imread(sys.argv[1])
img2 = img[:,:,2]
img2 = img2 - cv2.erode(img2, None)
template = cv2.imread(sys.argv[2])[:,:,2]
template = template - cv2.erode(template, None)
ccnorm = cv2.matchTemplate(img2, template, cv2.TM_CCORR_NORMED)
print ccnorm.max()
loc = numpy.where(ccnorm == ccnorm.max())
threshold = 0.4
th, tw = template.shape[:2]
for pt in zip(*loc[::-1]):
if ccnorm[pt[::-1]] < threshold:
continue
cv2.rectangle(img, pt, (pt[0] + tw, pt[1] + th),
(0, 0, 255), 2)
cv2.imwrite(sys.argv[2], img)
I expect it to fail in more varied situations, but there are a couple of easy adjustments to be done.
Template matching doesn't always give good results. you should look into Keypoints matching.
Let's assume that you managed to cut out Mario or get ROI image of mario. Make this image your template image. Now, find keypoints in the main image and also in the template. So now you have two sets of keypoints. One for the image and other for Mario(template).
You can use SIFT, SURF, ORB depending on your preferences.
[EDIT]:
This is what I got using this method with SIFT and flann based knn matching. I haven't done the bounding box part.
Since your template is very small, SIFT and SURF would not give many keypoints. But to get good number of feature points, you could try Harris Corner detector. I applied Harris corner on the image and I got pretty good points on Mario.
If you have used SIFT or SURF, you'd have descriptors of both the image and the template. Match these keypoints using KNN or some other efficient matching algorithm. If you are using OpenCV, I'd suggest you to look into flannbased matcher. After matching the keypoints, you would want to filter out the incorrect matches. You can do this by K- nearest neighbors and depending upon the distance of the nearest match you can further filter out keypoints. You can further filter your matches using Forward-Backward Error.
[EDIT]: If you are using Harris Corner detector, you'd get only points and not keypoints. You can either convert them into keypoints or write your own brute force mathcer. It's not that difficult.
After filtering the keypoints, you'd have a cluster of keypoints near your object (in this case, Mario) and few scattered keypoints. To eliminate these scattered keypoints, you could use clustering. DBSCAN clustering will help you get a good cluster of points.
Now you have a cluster of keypoints. Using k-means, you should try to find the center of the cluster. Once you obtain the center of the cluster, you can estimate the bounding box.
I hope this helps.
[EDIT]
Trying to match points using Harris Corners. After filtering Harris corners, I'm using brute force method to match the points. some better algorithm might give you better results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With