Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCV/C++ program slower than its numpy counterpart, what should I do?

I implemented some time ago the Procrustes Analysis algorithm in Python and was told to port it to OpenCV/C++ recently. After finishing it I ran some tests and for the same input/instances, the C++ code is taking twice the time the Python code does (roughly 8 vs 4 seconds, respectively. I'm repeating the tests a thousand times just to make sure I'm not measuring them over a period too small). I'm baffled by these results.

I've used gprof to try to understand what's going on, but I can't tell a whole lot being wrong, besides the fact that cv::Mat::~Mat() is taking 34.67% of the execution time and being called 100+ times more often than any other functions. Not sure what I should do about that either, unless I'm supposed to replace cv::Mats for std::vectors or raw arrays, both of which would seem like a bad practice to me.

void align(const cv::Mat& points, const cv::Mat& pointsRef, cv::Mat& res, cv::Mat& ops) {
    cv::Mat pts(points.rows, points.cols, CV_64FC1);
    cv::Mat ptsRef(points.rows, points.cols, CV_64FC1);
    points.copyTo(pts);
    pointsRef.copyTo(ptsRef);

    cv::Mat avgs = meanOfColumns(pts);
    for(int i = 0; i < avgs.cols; i++) {
        pts.col(i) -= avgs.col(i);
    }
    cv::Mat avgsR = meanOfColumns(ptsRef);
    for(int i = 0; i < avgsR.cols; i++) {
        ptsRef.col(i) -= avgsR.col(i);
    }

    cv::Mat x2(pts.rows, 1, CV_64FC1);
    cv::Mat y2(pts.rows, 1, CV_64FC1);
    cv::Mat x2R(pts.rows, 1, CV_64FC1);
    cv::Mat y2R(pts.rows, 1, CV_64FC1);
    cv::pow(pts.col(0), 2, x2);
    cv::pow(pts.col(1), 2, y2);
    cv::pow(ptsRef.col(0), 2, x2R);
    cv::pow(ptsRef.col(1), 2, y2R);
    cv::Mat sqrootP(pts.rows, 1, CV_64FC1);
    cv::Mat sqrootPR(pts.rows, 1, CV_64FC1);
    cv::sqrt(x2R + y2R, sqrootPR);
    cv::sqrt(x2 + y2, sqrootP);
    double offsetS = (cv::mean(sqrootPR) / cv::mean(sqrootP))[0];
    pts *= offsetS;

    cv::Mat rot(pts.rows, 1, CV_64FC1);
    cv::Mat rotR(pts.rows, 1, CV_64FC1);
    rot = arctan2(pts.col(1), pts.col(0));
    rotR = arctan2(ptsRef.col(1), ptsRef.col(0));
    double offsetR = -cv::mean((rot - rotR))[0];
    cv::Mat angRot(pts.rows, 1, CV_64FC1);
    angRot = rot + offsetR;
    cv::Mat dist(pts.rows, 1, CV_64FC1);
    cv::pow(pts.col(0), 2, x2);
    cv::pow(pts.col(1), 2, y2);
    cv::sqrt(x2 + y2, dist);
    copyColumn(dist.mul(cosine(angRot)), res, 0, 0);
    copyColumn(dist.mul(sine(angRot)), res, 0, 1);

    ops.at<double>(0, 0) = -avgs.at<double>(0, 0);
    ops.at<double>(0, 1) = -avgs.at<double>(0, 1);
    ops.at<double>(0, 2) = offsetS * cv::cos(offsetR / RADIANS_TO_DEGREES);
    ops.at<double>(0, 3) = offsetS * cv::sin(offsetR / RADIANS_TO_DEGREES);
}

This is the code to align 2 sets of points. It calls some functions that aren't shown, but they're simple and I can explain them if necessary, though I hope the names are enough to understand what they do.

I'm a casual C++ programmer, go easy on me guys.

It does seem like Ignacio Vazquez-Abrams has the right idea. A more concise/direct example:

#include <boost/date_time/posix_time/posix_time.hpp>
#include <cv.hpp>
#include <iostream>

using namespace boost::posix_time;

int main() {
    cv::Mat m1(1000, 1000, CV_64FC1);
    cv::Mat m2(1000, 1000, CV_64FC1);
    ptime firstValue( microsec_clock::local_time() );
    for(int i = 0; i < 10; i++) {
        cv::Mat m3 = m1 * m2;
    }
    ptime secondValue( microsec_clock::local_time() );
    time_duration diff = secondValue - firstValue;
    std::cout << diff.seconds() << "." << diff.fractional_seconds() << " microsec" << std::endl;
}

That takes around 14+ seconds in my machine. Now Python:

import datetime
import numpy as np

if __name__ == '__main__':
    print datetime.datetime.now()
    m1 = np.zeros((1000, 1000), dtype=float)
    m2 = np.zeros((1000, 1000), dtype=float)
    for i in range(1000):
        m3 = np.dot(m1, m2)
    print datetime.datetime.now()

That takes 4+ seconds, though the C++ example is only doing it 10 times, whereas the Python(Fortran) one is doing it 1000.

Well okay, update time.

I reviewed the Python code I was using and realized it was only loading a subset of the points (about 5%)... Which means my C++ tests were actually running about 20 times more instances than the Python code, so the C++ code is actually around 10 times faster, since the code was only twice as slow. It still seems as if numpy has OpenCV beat in some operations though.

like image 462
friday Avatar asked Jul 13 '11 05:07

friday


1 Answers

for(int i = 0; i < 10; i++) {
        cv::Mat m3 = m1 * m2;
}

This is totally pointless in c++, the m3 is destroyed on each iteration of the loop - that's why you get all those destructor calls.

edit:

cv::Mat m3 = m1 * m2;

and

m3 = np.dot(m1, m2)

aren't the same thing. Have you tried comparing a cross product in numpy or a dot product in opencv?

like image 141
Martin Beckett Avatar answered Oct 27 '22 00:10

Martin Beckett