(opencv rc1) What causes Mat multiplication to be 20x slower than per-pixel multiplication?

Tags:

// 700 ms
cv::Mat in(height,width,CV_8UC1);
in /= 4;

Replaced with

//40 ms
cv::Mat in(height,width,CV_8UC1);
for (int y=0; y < in.rows; ++y)
{
    unsigned char* ptr = in.data + y*in.step1();
    for (int x=0; x < in.cols; ++x)
    {
        ptr[x] /= 4;
    }
}

What can cause such behavior? Is it due to opencv "promoting" Mat with Scalar multiplication to a Mat with Mat multiplication, or is it a specific failed optimization for arm? (NEON is enabled).

820

asked May 11 '15 11:05

Boyko Perfanov

2 Answers

This is a very old issue (I reported it couple of years ago) that many basic operations are taking extra time. Not just division but also addition, abs, etc... I don't know the real reason for that behavior. What is even more weird, is that the operations that supposed to take more time, like addWeighted, are actually very efficient. Try this one:

addWeighted(in, 1.0/4, in, 0, 0, in);

It performs multiple operations per pixel yet it run few times faster than either add function and loop implementation.

Here is my report on bug tracker.

146

answered Oct 10 '22 02:10

Michael Burdinov

Tried the same by measuring cpu time.

int main()
{
    clock_t startTime;
    clock_t endTime;

    int height =1024;
    int width =1024;

    // 700 ms
    cv::Mat in(height,width,CV_8UC1, cv::Scalar(255));
    std::cout << "value: " << (int)in.at<unsigned char>(0,0) << std::endl;

    cv::Mat out(height,width,CV_8UC1);

    startTime = clock();
    out = in/4;
    endTime = clock();
    std::cout << "1: " << (float)(endTime-startTime)/(float)CLOCKS_PER_SEC << std::endl;
    std::cout << "value: " << (int)out.at<unsigned char>(0,0) << std::endl;


    startTime = clock();
    in /= 4;
    endTime = clock();
    std::cout << "2: " <<  (float)(endTime-startTime)/(float)CLOCKS_PER_SEC << std::endl;
    std::cout << "value: " << (int)in.at<unsigned char>(0,0) << std::endl;

    //40 ms
    cv::Mat in2(height,width,CV_8UC1, cv::Scalar(255));

    startTime = clock();
    for (int y=0; y < in2.rows; ++y)
    {
        //unsigned char* ptr = in2.data + y*in2.step1();
        unsigned char* ptr = in2.ptr(y);
        for (int x=0; x < in2.cols; ++x)
        {
            ptr[x] /= 4;
        }
    }
    std::cout << "value: " << (int)in2.at<unsigned char>(0,0) << std::endl;

    endTime = clock();
    std::cout << "3: " <<  (float)(endTime-startTime)/(float)CLOCKS_PER_SEC << std::endl;


    cv::namedWindow("...");
    cv::waitKey(0);
}

with results:

value: 255
1: 0.016
value: 64
2: 0.016
value: 64
3: 0.003
value: 63

you see that the results differ, probably because mat.divide() does perform floating point division and rounding to next. While you use integer division in your faster version, which is faster but gives a different result.

In addition, there is a saturate_cast in openCV computation, but I guess the bigger computation load difference will be the double precision division.

answered Oct 10 '22 02:10

Micka

Related questions
                            
                                What happened to the "real" Cassandra C++ library libcql?
                            
                                Character classification
                            
                                Is the article Generic<Programming> Typed Buffers completely obsolete with C++ 11?
                            
                                Placement new and inheritance
                            
                                Detect and Remove Hidden Surfaces of a Mesh
                            
                                Can I get an XML AST of C/C++/Java code without compiling it?
                            
                                constexpr returning array, gcc warning
                            
                                Space complexity of C++ STL containers
                            
                                How is floating point overflow handled in iostreams
                            
                                How to generate .pch for lots of headers?
                            
                                Video captured by Media Foundation is vertical mirrorred
                            
                                Why is there no [] operator for std::multimap?
                            
                                Mixing libstdc++ versions
                            
                                Why isn't __clang__ defined when using LLVM+Clang in Visual Studio?
                            
                                C++ enforce second-pass name lookup in template function
                            
                                gcc-4.9.2: non-type template parameter
                            
                                Is providing a private constructor for initializer_list conforming?
                            
                                Why my code is much slower than opencv for a simple StereoBM algorithm?
                            
                                QPushButton has duplicated text after Qt upgrade
                            
                                C11 & C++11 Exended and Universal Character Escaping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

(opencv rc1) What causes Mat multiplication to be 20x slower than per-pixel multiplication?

Tags:

c++

opencv

java-native-interface

arm

neon

Boyko Perfanov

People also ask

2 Answers

Michael Burdinov

Micka

Recent Activity

Donate For Us