Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to loop through pixels of 16-bit Mat in OpenCV

I'm trying to make very simple (LUT-like) operations on a 16-bit gray-scale OpenCV Mat, which is efficient and doesn't slow down the debugger.

While there is a very detailed page in the documentation addressing exactly this issue, it fails to point out that most of those methods are only available on 8-bit images (including the perfect, optimized LUT function).

I tried the following methods:

uchar* p = mat_depth.data;
for (unsigned int i = 0; i < depth_width * depth_height * sizeof(unsigned short); ++i)
{
    *p = ...;
    *p++;
}

Really fast, unfortunately only supporting uchart (just like LUT).


int i = 0;
    for (int row = 0; row < depth_height; row++)
    {
        for (int col = 0; col < depth_width; col++)
        {
            i = mat_depth.at<short>(row, col);
            i = ..
            mat_depth.at<short>(row, col) = i;
        }
    }

Adapted from this answer: https://stackoverflow.com/a/27225293/518169. Didn't work for me, and it was very slow.


cv::MatIterator_<ushort> it, end;
    for (it = mat_depth.begin<ushort>(), end = mat_depth.end<ushort>(); it != end; ++it)
    {
       *it = ...;   
    }

Works well, however it uses a lot of CPU and makes the debugger super slow.


This answer https://stackoverflow.com/a/27099697/518169 points out to the source code of the built-in LUT function, however it only mentions advanced optimization techniques, like IPP and OpenCL.

What I'm looking for is a very simple loop like the first code, but for ushorts.

What method do you recommend for solving this problem? I'm not looking for extreme optimization, just something on par with the performance of the single-for-loop on .data.

like image 947
hyperknot Avatar asked Feb 10 '15 03:02

hyperknot


2 Answers

I implemented Michael's and Kornel's suggestion and benchmarked them both in release and debug modes.

code:

cv::Mat LUT_16(cv::Mat &mat, ushort table[])
{
    int limit = mat.rows * mat.cols;

    ushort* p = mat.ptr<ushort>(0);
    for (int i = 0; i < limit; ++i)
    {
        p[i] = table[p[i]];
    }
    return mat;
}

cv::Mat LUT_16_reinterpret_cast(cv::Mat &mat, ushort table[])
{
    int limit = mat.rows * mat.cols;

    ushort* ptr = reinterpret_cast<ushort*>(mat.data);
    for (int i = 0; i < limit; i++, ptr++)
    {
        *ptr = table[*ptr];
    }
    return mat;
}

cv::Mat LUT_16_if(cv::Mat &mat)
{
    int limit = mat.rows * mat.cols;

    ushort* ptr = reinterpret_cast<ushort*>(mat.data);
    for (int i = 0; i < limit; i++, ptr++)
    {
        if (*ptr == 0){
            *ptr = 65535;
        }
        else{
            *ptr *= 100;
        }
    }
    return mat;
}

ushort* tablegen_zero()
{
    static ushort table[65536];
    for (int i = 0; i < 65536; ++i)
    {
        if (i == 0)
        {
            table[i] = 65535;
        }
        else
        {
            table[i] = i;
        }
    }
    return table;
}

The results are the following (release/debug):

  • LUT_16: 0.202 ms / 0.773 ms
  • LUT_16_reinterpret_cast: 0.184 ms / 0.801 ms
  • LUT_16_if: 0.249 ms / 0.860 ms

So the conclusion is that reinterpret_cast is the faster by 9% in release mode, while the ptr one is faster by 4% in debug mode.

It's also interesting to see that directly calling the if function instead of applying a LUT only makes it slower by 0.065 ms.

Specs: streaming 640x480x16-bit grayscale image, Visual Studio 2013, i7 4750HQ.

like image 190
hyperknot Avatar answered Sep 19 '22 00:09

hyperknot


OpenCV implementation is based on polymorphism and runtime dispatching over templates. In OpenCV version the use of templates is limited to a fixed set of primitive data types. That is, array elements should have one of the following types:

  • 8-bit unsigned integer (uchar)
  • 8-bit signed integer (schar)
  • 16-bit unsigned integer (ushort)
  • 16-bit signed integer (short)
  • 32-bit signed integer (int)
  • 32-bit floating-point number (float)
  • 64-bit floating-point number (double)
  • a tuple of several elements where all elements have the same type (one of the above).

In case your cv::Mat is continues you can use pointer arithmetics to go through the whole data pointer and you should only use the appropriate pointer type to your cv::Mat. Furthermore, keep it mind that cv::Mats are not always continuous (it can be a ROI, padded, or created from pixel pointer) and iterating over them with pointers will crash.

An example loop:

cv::Mat cvmat16sc1 = cv::Mat::eye(10, 10, CV_16SC1);

if (cvmat16sc1.data)
{
    if (!cvmat16sc1.isContinuous())
    {
        cvmat16sc1 = cvmat16sc1.clone();
    }

    short* ptr = reinterpret_cast<short*>(cvmat16sc1.data);
    for (int i = 0; i < cvmat16sc1.cols * cvmat16sc1.rows; i++, ptr++)
    {
        if (*ptr == 1)
            std::cout << i << ": " << *ptr << std::endl;
    }
}
like image 34
Kornel Avatar answered Sep 22 '22 00:09

Kornel