I've been working with OpenCV and Apple's Accelerate framework and find the performance of Accelerate to be slow and Apple's documentation limited. Let's take for example:
void equalizeHistogram(const cv::Mat &planar8Image, cv::Mat &equalizedImage)
{
cv::Size size = planar8Image.size();
vImage_Buffer planarImageBuffer = {
.width = static_cast<vImagePixelCount>(size.width),
.height = static_cast<vImagePixelCount>(size.height),
.rowBytes = planar8Image.step,
.data = planar8Image.data
};
vImage_Buffer equalizedImageBuffer = {
.width = static_cast<vImagePixelCount>(size.width),
.height = static_cast<vImagePixelCount>(size.height),
.rowBytes = equalizedImage.step,
.data = equalizedImage.data
};
TIME_START(VIMAGE_EQUALIZE_HISTOGRAM);
vImage_Error error = vImageEqualization_Planar8(&planarImageBuffer, &equalizedImageBuffer, kvImageNoFlags);
TIME_END(VIMAGE_EQUALIZE_HISTOGRAM);
if (error != kvImageNoError) {
NSLog(@"%s, vImage error %zd", __PRETTY_FUNCTION__, error);
}
}
This call takes roughly 20ms. Which has the practical meaning of being unusable in my application. Maybe equalization of the histogram is inherently slow, but I've also tested BGRA->Grayscale and found OpenCV can do it in ~5ms and vImage takes ~20ms.
In testing of other functions I found a project that made a simple slider app with a blur function (gist) that I cleaned up to test. Roughly ~20ms as well.
Is there some trick to getting these functions to be faster?
To use vImage with OpenCV, pass a reference to your OpenCV matrix to a method like this one:
long contrastStretch_Accelerate(const Mat& src, Mat& dst) {
vImagePixelCount rows = static_cast<vImagePixelCount>(src.rows);
vImagePixelCount cols = static_cast<vImagePixelCount>(src.cols);
vImage_Buffer _src = { src.data, rows, cols, src.step };
vImage_Buffer _dst = { dst.data, rows, cols, dst.step };
vImage_Error err;
err = vImageContrastStretch_ARGB8888( &_src, &_dst, 0 );
return err;
}
The call to this method, from your OpenCV code block, looks like this:
- (void)processImage:(Mat&)image;
{
contrastStretch_Accelerate(image, image);
}
It's that simple, and since these are all pointer references, there's no "deep copying" of any kind. It's as fast and efficient as it can possibly be, all questions of context and other related performance-considerations aside (I can help you with those, too).
SIDENOTE: Did you know that you have to change the channel permutation when mixing OpenCV with vImage? If not, prior to calling any vImage functions on an OpenCV matrix, call:
const uint8_t map[4] = { 3, 2, 1, 0 };
err = vImagePermuteChannels_ARGB8888(&_img, &_img, map, kvImageNoFlags);
if (err != kvImageNoError)
NSLog(@"vImagePermuteChannels_ARGB8888 error: %ld", err);
Perform the same call, map and all, to return the image to the channel order proper for an OpenCV matrix.
One thing that is critical to vImage accelerate performance is the reuse of vImage_Buffers. I can't say how many times I read in Apple's limited documentation hints to this effect, but I was definitely not listening.
In the aforementioned blur code example, I reworked the test app to setup the vImage_Buffer input and output buffers once per image rather than once for each call to boxBlur. I dropped <10ms per call which made a noticeable difference in response time.
This says that Accelerate needs time to warm-up before you start seeing performance improvements. The first call to this method took 34ms.
- (UIImage *)boxBlurWithSize:(int)boxSize
{
vImage_Error error;
error = vImageBoxConvolve_ARGB8888(&_inputImageBuffer,
&_outputImageBuffer,
NULL,
0,
0,
boxSize,
boxSize,
NULL,
kvImageEdgeExtend);
if (error) {
NSLog(@"vImage error %zd", error);
}
CGImageRef modifiedImageRef = vImageCreateCGImageFromBuffer(&_outputImageBuffer,
&_inputImageFormat,
NULL,
NULL,
kvImageNoFlags,
&error);
UIImage *returnImage = [UIImage imageWithCGImage:modifiedImageRef];
CGImageRelease(modifiedImageRef);
return returnImage;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With