Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using opencv to match an image from a group of images for purpose of identification in C++

Tags:

EDIT: I've acquired enough reputation through this post to be able to edit it with more links, which will help me get my point across better

People playing binding of isaac often come across important items on little pedestals.

The goal is to have a user confused about what an item is be able to press a button which will then instruct him to "box" the item(think windows desktop boxing). The box gives us the region of interest(the actual item plus some background environment) to compare to what will be an entire grid of items.

Theoretical user boxed item enter image description here

Theoretical grid of items(there's not many more, I just ripped this out of the binding of isaac wiki) enter image description here

The location in the grid of items identified as the item the user boxed would represent a certain area on the image that correlates to a proper link to the binding of isaac wiki giving information on the item.

In the grid the item is 1st column 3rd from the bottom row. I use these two images in all of the things I tried below


My goal is creating a program that can take a manual crop of an item from the game "The Binding of Isaac", identify the cropped item by finding comparing the image to an image of a table of items in the game, then display the proper wiki page.

This would be my first "real project" in the sense that it requires a huge amount of library learning to get what I want done. It's been a bit overwhelming.

I've messed with a few options just from googling around. (you can quickly find the tutorials I used by searching the name of the method and opencv. my account is heavily restricted with link posting for some reason)

using bruteforcematcher:

http://docs.opencv.org/doc/tutorials/features2d/feature_description/feature_description.html

#include <stdio.h> #include <iostream> #include "opencv2/core/core.hpp" #include <opencv2/legacy/legacy.hpp> #include <opencv2/nonfree/features2d.hpp> #include "opencv2/highgui/highgui.hpp"  using namespace cv;  void readme();  /** @function main */ int main( int argc, char** argv ) {   if( argc != 3 )    { return -1; }    Mat img_1 = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE );   Mat img_2 = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE );    if( !img_1.data || !img_2.data )    { return -1; }    //-- Step 1: Detect the keypoints using SURF Detector   int minHessian = 400;    SurfFeatureDetector detector( minHessian );    std::vector<KeyPoint> keypoints_1, keypoints_2;    detector.detect( img_1, keypoints_1 );   detector.detect( img_2, keypoints_2 );    //-- Step 2: Calculate descriptors (feature vectors)   SurfDescriptorExtractor extractor;    Mat descriptors_1, descriptors_2;    extractor.compute( img_1, keypoints_1, descriptors_1 );   extractor.compute( img_2, keypoints_2, descriptors_2 );    //-- Step 3: Matching descriptor vectors with a brute force matcher   BruteForceMatcher< L2<float> > matcher;   std::vector< DMatch > matches;   matcher.match( descriptors_1, descriptors_2, matches );    //-- Draw matches   Mat img_matches;   drawMatches( img_1, keypoints_1, img_2, keypoints_2, matches, img_matches );    //-- Show detected matches   imshow("Matches", img_matches );    waitKey(0);    return 0;   }   /** @function readme */  void readme()  { std::cout << " Usage: ./SURF_descriptor <img1> <img2>" << std::endl; } 

enter image description here

results in not so useful looking stuff. Cleaner but equally unreliable results using flann.

http://docs.opencv.org/doc/tutorials/features2d/feature_flann_matcher/feature_flann_matcher.html

#include <stdio.h> #include <iostream> #include "opencv2/core/core.hpp" #include <opencv2/legacy/legacy.hpp> #include <opencv2/nonfree/features2d.hpp> #include "opencv2/highgui/highgui.hpp"  using namespace cv;  void readme();  /** @function main */ int main( int argc, char** argv ) {   if( argc != 3 )   { readme(); return -1; }    Mat img_1 = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE );   Mat img_2 = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE );    if( !img_1.data || !img_2.data )   { std::cout<< " --(!) Error reading images " << std::endl; return -1; }    //-- Step 1: Detect the keypoints using SURF Detector   int minHessian = 400;    SurfFeatureDetector detector( minHessian );    std::vector<KeyPoint> keypoints_1, keypoints_2;    detector.detect( img_1, keypoints_1 );   detector.detect( img_2, keypoints_2 );    //-- Step 2: Calculate descriptors (feature vectors)   SurfDescriptorExtractor extractor;    Mat descriptors_1, descriptors_2;    extractor.compute( img_1, keypoints_1, descriptors_1 );   extractor.compute( img_2, keypoints_2, descriptors_2 );    //-- Step 3: Matching descriptor vectors using FLANN matcher   FlannBasedMatcher matcher;   std::vector< DMatch > matches;   matcher.match( descriptors_1, descriptors_2, matches );    double max_dist = 0; double min_dist = 100;    //-- Quick calculation of max and min distances between keypoints   for( int i = 0; i < descriptors_1.rows; i++ )   { double dist = matches[i].distance;     if( dist < min_dist ) min_dist = dist;     if( dist > max_dist ) max_dist = dist;   }    printf("-- Max dist : %f \n", max_dist );   printf("-- Min dist : %f \n", min_dist );    //-- Draw only "good" matches (i.e. whose distance is less than 2*min_dist )   //-- PS.- radiusMatch can also be used here.   std::vector< DMatch > good_matches;    for( int i = 0; i < descriptors_1.rows; i++ )   { if( matches[i].distance < 2*min_dist )     { good_matches.push_back( matches[i]); }   }    //-- Draw only "good" matches   Mat img_matches;   drawMatches( img_1, keypoints_1, img_2, keypoints_2,                good_matches, img_matches, Scalar::all(-1), Scalar::all(-1),                vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );    //-- Show detected matches   imshow( "Good Matches", img_matches );    for( int i = 0; i < good_matches.size(); i++ )   { printf( "-- Good Match [%d] Keypoint 1: %d  -- Keypoint 2: %d  \n", i, good_matches[i].queryIdx, good_matches[i].trainIdx ); }    waitKey(0);    return 0;  }   /** @function readme */  void readme()  { std::cout << " Usage: ./SURF_FlannMatcher <img1> <img2>" << std::endl; } 

enter image description here

templatematching has been my best method so far. of the 6 methods it ranges from getting only 0-4 correct identifications though.

http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html

#include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include <iostream> #include <stdio.h>  using namespace std; using namespace cv;  /// Global Variables Mat img; Mat templ; Mat result; char* image_window = "Source Image"; char* result_window = "Result window";  int match_method; int max_Trackbar = 5;  /// Function Headers void MatchingMethod( int, void* );  /** @function main */ int main( int argc, char** argv ) {   /// Load image and template   img = imread( argv[1], 1 );   templ = imread( argv[2], 1 );    /// Create windows   namedWindow( image_window, CV_WINDOW_AUTOSIZE );   namedWindow( result_window, CV_WINDOW_AUTOSIZE );    /// Create Trackbar   char* trackbar_label = "Method: \n 0: SQDIFF \n 1: SQDIFF NORMED \n 2: TM CCORR \n 3: TM CCORR NORMED \n 4: TM COEFF \n 5: TM COEFF NORMED";   createTrackbar( trackbar_label, image_window, &match_method, max_Trackbar, MatchingMethod );    MatchingMethod( 0, 0 );    waitKey(0);   return 0; }  /**  * @function MatchingMethod  * @brief Trackbar callback  */ void MatchingMethod( int, void* ) {   /// Source image to display   Mat img_display;   img.copyTo( img_display );    /// Create the result matrix   int result_cols =  img.cols - templ.cols + 1;   int result_rows = img.rows - templ.rows + 1;    result.create( result_cols, result_rows, CV_32FC1 );    /// Do the Matching and Normalize   matchTemplate( img, templ, result, match_method );   normalize( result, result, 0, 1, NORM_MINMAX, -1, Mat() );    /// Localizing the best match with minMaxLoc   double minVal; double maxVal; Point minLoc; Point maxLoc;   Point matchLoc;    minMaxLoc( result, &minVal, &maxVal, &minLoc, &maxLoc, Mat() );    /// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better   if( match_method  == CV_TM_SQDIFF || match_method == CV_TM_SQDIFF_NORMED )     { matchLoc = minLoc; }   else     { matchLoc = maxLoc; }    /// Show me what you got   rectangle( img_display, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );   rectangle( result, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );    imshow( image_window, img_display );   imshow( result_window, result );    return; } 

http://imgur.com/pIRBPQM,h0wkqer,1JG0QY0,haLJzRF,CmrlTeL,DZuW73V#3

of the 6 fail,pass,fail,pass,pass,pass

This was sort of a best case result though. The next item I tried was

enter image description here and resulted in fail,fail,fail,fail,fail,fail

From item to item all of these methods have some that work well and some that do terribly

So I'll ask: is templatematching my best bet or is there a method I'm not considering that will be my holy grail?

How can I get a USER to create the crop manually? Opencv's documentation on this is really bad and the examples I find online are extremely old cpp or straight C.

Thanks for any help. This venture has been an interesting experience so far. I had to strip all of the links which would better portray how everything's been working out, but the site is saying I'm posting more than 10 links even when I'm not.


some more examples of items throughout the game:

the rock is a rare item and one of the few that can be "anywhere" on the screen. items like the rock are the reason why cropping of the item by user is the best way about isolating the item, otherwise their positions are only in a couple of specific places.

enter image description here

enter image description here

An item after a boss fight, lots of stuff everywhere and transparency in the middle. I would imagine this being one of the harder ones to work correctly

enter image description here

enter image description here

Rare room. simple background. no item transparency.

enter image description here

enter image description here

here are the two tables all of the items in the game are.. I'll make them one image eventually but for now they were directly taken from the isaac wiki.

enter image description here

enter image description here

like image 413
2c2c Avatar asked Feb 07 '13 04:02

2c2c


People also ask

Which are the matching methods available in OpenCV?

matchTemplate function with three parameters: The input image that contains the object we want to detect. The template of the object (i.e., what we want to detect in the image ) The template matching method.

How does template matching works in OpenCV?

OpenCV comes with a function cv. matchTemplate() for this purpose. It simply slides the template image over the input image (as in 2D convolution) and compares the template and patch of input image under the template image. Several comparison methods are implemented in OpenCV.

What is mask in template matching?

Template matching is a technique for finding areas of an image that match (are similar) to a template image (patch). While the patch must be a rectangle it may be that not all of the rectangle is relevant. In such a case, a mask can be used to isolate the portion of the patch that should be used to find the match.

How do I put two images in OpenCV?

You can add two images with the OpenCV function, cv. add(), or simply by the numpy operation res = img1 + img2. Both images should be of same depth and type, or the second image can just be a scalar value.


2 Answers

One important detail here is that you have pure image of every item in your table. You know color of background and can detach item from the rest of the picture. For example, in addition to matrix, representing image itself, you may store matrix of 1-s and 0-s of the same size, where ones correspond to image area and zeros - to background. Let's call this matrix "mask" and pure image of the item - "pattern".

There are 2 ways to compare images: match image with the pattern and match pattern with the image. What you have described is matching image with the pattern - you have some cropped image and want to find similar pattern. Instead, think about searching pattern on image.

Let's first define function match() that takes pattern, mask and image of the same size and checks if area on pattern under the mask is exactly the same as in image (pseudocode):

def match(pattern, mask, image):     for x = 0 to pattern.width:         for y = 0 to pattern.height:             if mask[x, y] == 1 and              # if in pattern this pixel is not part of background               pattern[x, y] != image[x, y]:    # and pixels on pattern and image differ                return False       return True 

But sizes of pattern and cropped image may differ. Standard solution for this (used, for example, in cascade classifier) is to use sliding window - just move pattern "window" across image and check if pattern matches selected region. This is pretty much how image detection works in OpenCV.

Of course, this solution is not very robust - cropping, resizing or any other image transformations may change some pixels, and in this case method match() will always return false. To overcome this, instead of boolean answer you can use distance between image and pattern. In this case function match() should return some value of similarity, say, between 0 and 1, where 1 stands for "exactly the same", while 0 for "completely different". Then you either set threshold for similarity (e.g. image should be at least 85% similar to the pattern), or just select pattern with highest value of similarity.

Since items in the game are artificial images and variation in them is very small, this approach should be enough. However, for more complicated cases you will need other features than simply pixels under the mask. As I already suggested in my comment, methods like Eigenfaces, cascade classifier using Haar-like features or even Active Appearance Models may be more efficient for these tasks. As for SURF, as far as I know it's better suited for tasks with varying angle and size of object, but not for different backgrounds and all such things.

like image 136
ffriend Avatar answered Oct 26 '22 23:10

ffriend


I came upon your question while trying to figure out my own template-matching issue, and now I'm back to share what I think might be your best bet based on my own experience. You've probably long-since abandoned this, but hey someone else might be in similar shoes one day.

None of the items that you shared are a solid rectangle, and since template matching in opencv cannot work with a mask you'll always be comparing your reference image against what I must assume is at least several different backgrounds (not to mention the items that are found in varied locations on different backgrounds, making the template match even worse).
It will always be comparing the background pixels and confounding your match unless you can collect a crop of every single situation where the reference image can be found. If decals of blood/etc introduce yet more variability into the backgrounds around the items too then template matching probably won't get great results.

So the two things I would try if I were you are depending on some details:

  1. If possible, crop a reference template of every situation where the item is found (this will not be a good time), then compare the user-specified area against every template of every item. Take the best result from these comparisons and you will, if lucky, have a correct match.
  2. The example screen shots you shared don't have any dark/black lines on the background,so the outlines of all of the items stands out. If this is consistent throughout the game, you can find edges within the user-specified area and detect the exterior contours. Ahead of time you would have processed the exterior contours of each reference item and stored those contours. Then you can compare your contour(s) in the user's crop against each contour in your database, taking the best match as the answer.

I'm confident either of those could work for you, depending on whether the game is well-represented by your screenshots.

Note: The contour matching will be much, much faster than the template matching. Fast enough to run in realtime and negate the need for the user to crop anything, perhaps.

like image 26
Christopher Peterson Avatar answered Oct 26 '22 23:10

Christopher Peterson