OpenCV/C++ program slower than its numpy counterpart, what should I do?

Question

I implemented some time ago the Procrustes Analysis algorithm in Python and was told to port it to OpenCV/C++ recently. After finishing it I ran some tests and for the same input/instances, the C++ code is taking twice the time the Python code does (roughly 8 vs 4 seconds, respectively. I'm repeating the tests a thousand times just to make sure I'm not measuring them over a period too small). I'm baffled by these results.

I've used gprof to try to understand what's going on, but I can't tell a whole lot being wrong, besides the fact that cv::Mat::~Mat() is taking 34.67% of the execution time and being called 100+ times more often than any other functions. Not sure what I should do about that either, unless I'm supposed to replace cv::Mats for std::vectors or raw arrays, both of which would seem like a bad practice to me.

void align(const cv::Mat& points, const cv::Mat& pointsRef, cv::Mat& res, cv::Mat& ops) {
    cv::Mat pts(points.rows, points.cols, CV_64FC1);
    cv::Mat ptsRef(points.rows, points.cols, CV_64FC1);
    points.copyTo(pts);
    pointsRef.copyTo(ptsRef);

    cv::Mat avgs = meanOfColumns(pts);
    for(int i = 0; i < avgs.cols; i++) {
        pts.col(i) -= avgs.col(i);
    }
    cv::Mat avgsR = meanOfColumns(ptsRef);
    for(int i = 0; i < avgsR.cols; i++) {
        ptsRef.col(i) -= avgsR.col(i);
    }

    cv::Mat x2(pts.rows, 1, CV_64FC1);
    cv::Mat y2(pts.rows, 1, CV_64FC1);
    cv::Mat x2R(pts.rows, 1, CV_64FC1);
    cv::Mat y2R(pts.rows, 1, CV_64FC1);
    cv::pow(pts.col(0), 2, x2);
    cv::pow(pts.col(1), 2, y2);
    cv::pow(ptsRef.col(0), 2, x2R);
    cv::pow(ptsRef.col(1), 2, y2R);
    cv::Mat sqrootP(pts.rows, 1, CV_64FC1);
    cv::Mat sqrootPR(pts.rows, 1, CV_64FC1);
    cv::sqrt(x2R + y2R, sqrootPR);
    cv::sqrt(x2 + y2, sqrootP);
    double offsetS = (cv::mean(sqrootPR) / cv::mean(sqrootP))[0];
    pts *= offsetS;

    cv::Mat rot(pts.rows, 1, CV_64FC1);
    cv::Mat rotR(pts.rows, 1, CV_64FC1);
    rot = arctan2(pts.col(1), pts.col(0));
    rotR = arctan2(ptsRef.col(1), ptsRef.col(0));
    double offsetR = -cv::mean((rot - rotR))[0];
    cv::Mat angRot(pts.rows, 1, CV_64FC1);
    angRot = rot + offsetR;
    cv::Mat dist(pts.rows, 1, CV_64FC1);
    cv::pow(pts.col(0), 2, x2);
    cv::pow(pts.col(1), 2, y2);
    cv::sqrt(x2 + y2, dist);
    copyColumn(dist.mul(cosine(angRot)), res, 0, 0);
    copyColumn(dist.mul(sine(angRot)), res, 0, 1);

    ops.at<double>(0, 0) = -avgs.at<double>(0, 0);
    ops.at<double>(0, 1) = -avgs.at<double>(0, 1);
    ops.at<double>(0, 2) = offsetS * cv::cos(offsetR / RADIANS_TO_DEGREES);
    ops.at<double>(0, 3) = offsetS * cv::sin(offsetR / RADIANS_TO_DEGREES);
}

This is the code to align 2 sets of points. It calls some functions that aren't shown, but they're simple and I can explain them if necessary, though I hope the names are enough to understand what they do.

I'm a casual C++ programmer, go easy on me guys.

It does seem like Ignacio Vazquez-Abrams has the right idea. A more concise/direct example:

#include <boost/date_time/posix_time/posix_time.hpp>
#include <cv.hpp>
#include <iostream>

using namespace boost::posix_time;

int main() {
    cv::Mat m1(1000, 1000, CV_64FC1);
    cv::Mat m2(1000, 1000, CV_64FC1);
    ptime firstValue( microsec_clock::local_time() );
    for(int i = 0; i < 10; i++) {
        cv::Mat m3 = m1 * m2;
    }
    ptime secondValue( microsec_clock::local_time() );
    time_duration diff = secondValue - firstValue;
    std::cout << diff.seconds() << "." << diff.fractional_seconds() << " microsec" << std::endl;
}

That takes around 14+ seconds in my machine. Now Python:

import datetime
import numpy as np

if __name__ == '__main__':
    print datetime.datetime.now()
    m1 = np.zeros((1000, 1000), dtype=float)
    m2 = np.zeros((1000, 1000), dtype=float)
    for i in range(1000):
        m3 = np.dot(m1, m2)
    print datetime.datetime.now()

That takes 4+ seconds, though the C++ example is only doing it 10 times, whereas the Python(Fortran) one is doing it 1000.

Well okay, update time.

I reviewed the Python code I was using and realized it was only loading a subset of the points (about 5%)... Which means my C++ tests were actually running about 20 times more instances than the Python code, so the C++ code is actually around 10 times faster, since the code was only twice as slow. It still seems as if numpy has OpenCV beat in some operations though.

Martin Beckett · Accepted Answer

for(int i = 0; i < 10; i++) {
        cv::Mat m3 = m1 * m2;
}

This is totally pointless in c++, the m3 is destroyed on each iteration of the loop - that's why you get all those destructor calls.

edit:

cv::Mat m3 = m1 * m2;

and

m3 = np.dot(m1, m2)

aren't the same thing. Have you tried comparing a cross product in numpy or a dot product in opencv?

OpenCV/C++ program slower than its numpy counterpart, what should I do?

Tags:

c++

python

image-processing

opencv

numpy

friday

1 Answers

Martin Beckett

Recent Activity

Donate For Us

OpenCV/C++ program slower than its numpy counterpart, what should I do?

Tags:

c++

python

image-processing

opencv

numpy

friday

1 Answers

Martin Beckett

Related questions

Recent Activity

Donate For Us