OpenCV C++ multithreading speedups

Question

For the following code, here is a bit of context.

Mat img0; // 1280x960 grayscale

--

timer.start();
for (int i = 0; i < img0.rows; i++)
{
    vector<double> v;
    uchar* p = img0.ptr<uchar>(i);
    for (int j = 0; j < img0.cols; ++j)
    {
        v.push_back(p[j]);
    }
}
cout << "Single thread " << timer.end() << endl;

and

timer.start();
concurrency::parallel_for(0, img0.rows, [&img0](int i) {
    vector<double> v;
    uchar* p = img0.ptr<uchar>(i);
    for (int j = 0; j < img0.cols; ++j)
    {
        v.push_back(p[j]);
    }
});
cout << "Multi thread " << timer.end() << endl;

The result:

Single thread 0.0458856
Multi thread 0.0329856

The speedup is hardly noticeable.

My processor is Intel i5 3.10 GHz

RAM 8 GB DDR3

EDIT

I tried also a slightly different approach.

vector<Mat> imgs = split(img0, 2,1); // `split` is my custom function that, in this case, splits `img0` into two images, its left and right half

--

timer.start();
concurrency::parallel_for(0, (int)imgs.size(), [imgs](int i) {
    Mat img = imgs[i];
    vector<double> v;
    for (int row = 0; row < img.rows; row++)
    {
        uchar* p = img.ptr<uchar>(row);
        for (int col = 0; col < img.cols; ++col)
        {
            v.push_back(p[col]);
        }
    }

});
cout << " Multi thread Sectored " << timer.end() << endl;

And I get much better result:

Multi thread Sectored 0.0232881

So, it looks like I was creating 960 threads or something when I ran

parallel_for(0, img0.rows, ...

And that didn't work well.

(I must add that Kenney's comment is correct. Do not put too much relevance to the specific numbers I stated here. When measuring small intervals such as these, there are high variations. But in general, what I wrote in the edit, about splitting the image in half, improved performance in comparison to old approach.)

Martin Bonner supports Monica · Accepted Answer

I think your problem is that you are limited by memory bandwidth. Your second snippet is basically reading from the whole of the image, and that has got to come out of main memory into cache. (Or out of L2 cache into L1 cache).

You need to arrange your code so that all four cores are working on the same bit of memory at once (I presume you are not actually trying to optimize this code - it is just a simple example).

Edit: Insert crucial "not" in last parenthetical remark.

OpenCV C++ multithreading speedups

Tags:

c++

multithreading

opencv

ancajic

1 Answers

Martin Bonner supports Monica

Recent Activity

Donate For Us

OpenCV C++ multithreading speedups

Tags:

c++

multithreading

opencv

ancajic

1 Answers

Martin Bonner supports Monica

Related questions

Recent Activity

Donate For Us