I'm trying to make very simple (LUT-like) operations on a 16-bit gray-scale OpenCV Mat, which is efficient and doesn't slow down the debugger.
While there is a very detailed page in the documentation addressing exactly this issue, it fails to point out that most of those methods are only available on 8-bit images (including the perfect, optimized LUT function).
I tried the following methods:
uchar* p = mat_depth.data;
for (unsigned int i = 0; i < depth_width * depth_height * sizeof(unsigned short); ++i)
{
*p = ...;
*p++;
}
Really fast, unfortunately only supporting uchart (just like LUT).
int i = 0;
for (int row = 0; row < depth_height; row++)
{
for (int col = 0; col < depth_width; col++)
{
i = mat_depth.at<short>(row, col);
i = ..
mat_depth.at<short>(row, col) = i;
}
}
Adapted from this answer: https://stackoverflow.com/a/27225293/518169. Didn't work for me, and it was very slow.
cv::MatIterator_<ushort> it, end;
for (it = mat_depth.begin<ushort>(), end = mat_depth.end<ushort>(); it != end; ++it)
{
*it = ...;
}
Works well, however it uses a lot of CPU and makes the debugger super slow.
This answer https://stackoverflow.com/a/27099697/518169 points out to the source code of the built-in LUT function, however it only mentions advanced optimization techniques, like IPP and OpenCL.
What I'm looking for is a very simple loop like the first code, but for ushorts.
What method do you recommend for solving this problem? I'm not looking for extreme optimization, just something on par with the performance of the single-for-loop on .data.
I implemented Michael's and Kornel's suggestion and benchmarked them both in release and debug modes.
code:
cv::Mat LUT_16(cv::Mat &mat, ushort table[])
{
int limit = mat.rows * mat.cols;
ushort* p = mat.ptr<ushort>(0);
for (int i = 0; i < limit; ++i)
{
p[i] = table[p[i]];
}
return mat;
}
cv::Mat LUT_16_reinterpret_cast(cv::Mat &mat, ushort table[])
{
int limit = mat.rows * mat.cols;
ushort* ptr = reinterpret_cast<ushort*>(mat.data);
for (int i = 0; i < limit; i++, ptr++)
{
*ptr = table[*ptr];
}
return mat;
}
cv::Mat LUT_16_if(cv::Mat &mat)
{
int limit = mat.rows * mat.cols;
ushort* ptr = reinterpret_cast<ushort*>(mat.data);
for (int i = 0; i < limit; i++, ptr++)
{
if (*ptr == 0){
*ptr = 65535;
}
else{
*ptr *= 100;
}
}
return mat;
}
ushort* tablegen_zero()
{
static ushort table[65536];
for (int i = 0; i < 65536; ++i)
{
if (i == 0)
{
table[i] = 65535;
}
else
{
table[i] = i;
}
}
return table;
}
The results are the following (release/debug):
So the conclusion is that reinterpret_cast is the faster by 9% in release mode, while the ptr one is faster by 4% in debug mode.
It's also interesting to see that directly calling the if function instead of applying a LUT only makes it slower by 0.065 ms.
Specs: streaming 640x480x16-bit grayscale image, Visual Studio 2013, i7 4750HQ.
OpenCV implementation is based on polymorphism and runtime dispatching over templates. In OpenCV version the use of templates is limited to a fixed set of primitive data types. That is, array elements should have one of the following types:
In case your cv::Mat
is continues you can use pointer arithmetics to go through the whole data pointer and you should only use the appropriate pointer type to your cv::Mat
.
Furthermore, keep it mind that cv::Mat
s are not always continuous (it can be a ROI, padded, or created from pixel pointer) and iterating over them with pointers will crash.
An example loop:
cv::Mat cvmat16sc1 = cv::Mat::eye(10, 10, CV_16SC1);
if (cvmat16sc1.data)
{
if (!cvmat16sc1.isContinuous())
{
cvmat16sc1 = cvmat16sc1.clone();
}
short* ptr = reinterpret_cast<short*>(cvmat16sc1.data);
for (int i = 0; i < cvmat16sc1.cols * cvmat16sc1.rows; i++, ptr++)
{
if (*ptr == 1)
std::cout << i << ": " << *ptr << std::endl;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With