How to train OpenCV SVM with BoW Properly

Question

I can't train the SVM to recognize my object. I'm trying to do this using SURF + Bag Of Words + SVM. My problem is that the classifier does not detect anything. All the results are 0.

Here is my code:

Ptr<FeatureDetector> detector = FeatureDetector::create("SURF");
Ptr<DescriptorExtractor> descriptors = DescriptorExtractor::create("SURF");

string to_string(const int val) {
    int i = val;
    std::string s;
    std::stringstream out;
    out << i;
    s = out.str();
    return s;
}

Mat compute_features(Mat image) {
    vector<KeyPoint> keypoints;
    Mat features;

    detector->detect(image, keypoints);
    KeyPointsFilter::retainBest(keypoints, 1500);
    descriptors->compute(image, keypoints, features);

    return features;
}

BOWKMeansTrainer addFeaturesToBOWKMeansTrainer(String dir, BOWKMeansTrainer& bowTrainer) {
    DIR *dp;
    struct dirent *dirp;
    struct stat filestat;

    dp = opendir(dir.c_str());


    Mat features;
    Mat img;

    string filepath;
    #pragma loop(hint_parallel(4))
    for (; (dirp = readdir(dp));) {
        filepath = dir + dirp->d_name;

        cout << "Reading... " << filepath << endl;

        if (stat( filepath.c_str(), &filestat )) continue;
        if (S_ISDIR( filestat.st_mode ))         continue;

        img = imread(filepath, 0);

        features = compute_features(img);
        bowTrainer.add(features);
    }


    return bowTrainer;
}

void computeFeaturesWithBow(string dir, Mat& trainingData, Mat& labels, BOWImgDescriptorExtractor& bowDE, int label) {
    DIR *dp;
    struct dirent *dirp;
    struct stat filestat;

    dp = opendir(dir.c_str());

    vector<KeyPoint> keypoints;
    Mat features;
    Mat img;

    string filepath;

    #pragma loop(hint_parallel(4))
    for (;(dirp = readdir(dp));) {
        filepath = dir + dirp->d_name;

        cout << "Reading: " << filepath << endl;

        if (stat( filepath.c_str(), &filestat )) continue;
        if (S_ISDIR( filestat.st_mode ))         continue;

        img = imread(filepath, 0);

        detector->detect(img, keypoints);
        bowDE.compute(img, keypoints, features);

        trainingData.push_back(features);
        labels.push_back((float) label);
    }

    cout << string( 100, '
' );
}

int main() {
    initModule_nonfree();

    Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");

    TermCriteria tc(CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 10, 0.001);
    int dictionarySize = 1000;
    int retries = 1;
    int flags = KMEANS_PP_CENTERS;
    BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
    BOWImgDescriptorExtractor bowDE(descriptors, matcher);

    string dir = "./positive_large", filepath;
    DIR *dp;
    struct dirent *dirp;
    struct stat filestat;

    cout << "Add Features to KMeans" << endl;
    addFeaturesToBOWKMeansTrainer("./positive_large/", bowTrainer);
    addFeaturesToBOWKMeansTrainer("./negative_large/", bowTrainer);

    cout << endl << "Clustering..." << endl;

    Mat dictionary = bowTrainer.cluster();
    bowDE.setVocabulary(dictionary);

    Mat labels(0, 1, CV_32FC1);
    Mat trainingData(0, dictionarySize, CV_32FC1);


    cout << endl << "Extract bow features" << endl;

    computeFeaturesWithBow("./positive_large/", trainingData, labels, bowDE, 1);
    computeFeaturesWithBow("./negative_large/", trainingData, labels, bowDE, 0);

    CvSVMParams params;
    params.kernel_type=CvSVM::RBF;
    params.svm_type=CvSVM::C_SVC;
    params.gamma=0.50625000000000009;
    params.C=312.50000000000000;
    params.term_crit=cvTermCriteria(CV_TERMCRIT_ITER,100,0.000001);
    CvSVM svm;

    cout << endl << "Begin training" << endl;

    bool res=svm.train(trainingData,labels,cv::Mat(),cv::Mat(),params);

    svm.save("classifier.xml");

    //CvSVM svm;
    svm.load("classifier.xml");

    VideoCapture cap(0); // open the default camera

    if(!cap.isOpened())  // check if we succeeded
        return -1;

    Mat featuresFromCam, grey;
    vector<KeyPoint> cameraKeyPoints;
    namedWindow("edges",1);
    for(;;)
    {
        Mat frame;
        cap >> frame; // get a new frame from camera
        cvtColor(frame, grey, CV_BGR2GRAY);
        detector->detect(grey, cameraKeyPoints);
        bowDE.compute(grey, cameraKeyPoints, featuresFromCam);

        cout << svm.predict(featuresFromCam) << endl;
        imshow("edges", frame);
        if(waitKey(30) >= 0) break;
    }   

        return 0;
}

You should know that I got the parameters from an existing project with a good results, so I thought they'll be useful in my code too (but eventually maybe not).

I have 310 positive images and 508 negative images. I tried to use equal numbers of positive and negative images but the result is the same. The object I want to detect is car steering wheel. Here is my dataset.

Do you have any idea what I'm doing wrong? Thank you!

guneykayim · Accepted Answer

First of all, using same parameters from an existing project doesn't prove that you are using correct parameters. In fact, in my opinion it is a completely nonsense approach (no offense). It is because, SVM parameters are affected from dataset and decriptor extraction method directly. In order to get correct parameters you have to do cross-validation. So if those parameters are obtained from a different recognition task it won't make any sense. For example in my face verification project optimal parameters were 0.0625 and 10 for gamma and C respectively.

Other important issue with your approach is test images. As far as I see from your code, you are not using images from disk to test your classifier, so from rest of here I'll do some assumptions. If your test images, that you are acquired from camera are different from your positive images, it will fail. By different I mean this; you have to be sure that your test images are composed only of steering wheels, because your training images contain only steering wheels. If your test image contains, for instance car seat with it, your BoW descriptor for test image will be completely different from your train images BoW descriptor. So, simply, your test images shouldn't contain steering wheels with some other objects, they should only contain steering wheels.

If you satisfy these, using training images for testing your system is the most basic approach. Even in that scenario you are failing you probably have some implenetation issues. Other approach can be this; split your training data into two, such that you have four partitions:

Positive train images
Negative train images
Positive test images
Negative test images

Use only train images for training the system and test it with the test images. And again, you have to specify parameters via cross-validation.

Other than these, you might want to check some specific steps in order to localize the problem, before doing the previous things that I wrote:

How many keypoints are detected for each image? Similar images should result in similar number of keypoints.
You know that BoW descriptor is a histogram of the SURF descriptors of an image. Be sure that similar images result in similar histograms (BoW descriptors). It is better for you to check this by visualizing the histograms.
If the previous step is satisfied, the problem is most probably with the SVM training step, which is a very important step (maybe the most important one).

I hope I was able to emphasize the importance of the cross-validation. Do the cross-validation!

Good luck!

How to train OpenCV SVM with BoW Properly

Tags:

opencv

svm

object-detection

surf

dephinera

1 Answers

guneykayim

Recent Activity

Donate For Us

How to train OpenCV SVM with BoW Properly

Tags:

opencv

svm

object-detection

surf

dephinera

1 Answers

guneykayim

Related questions

Recent Activity

Donate For Us