Introduction:
The intention is to capture a video using OpenCV, and use it as input for an OpenCL program. The transfer of both needs to be as efficient as possible (if that would not be a concern, why using OpenCL, right?).
I read that OpenCV uses OpenCL internally (UMat
), and that I could access the GPU
buffer by accessing UMat::handle
. However, my attempts for this have been unsuccessful up to now.
The intention is to reuse UMat
buffer as the input for the OpenCL kernels
, and eventually, produce a result as an image back to another UMat
for displaying it.
The OpenCV framework is only intended to produce an input for the program, in consequence, I am not interested in using OpenCV CL wrapper (cv::ocl
) but rather use normal OpenCL (cl::...
). This avoid having the OpenCV framework included/linked in the full software.
The question:
How to access OpenCV UMat buffer through OpenCL?
DISCLAIM: please, be kind, as this is just supposed to be a very minimal example for this question.
#include <iostream>
#include <vector>
#include <cassert>
#define __CL_ENABLE_EXCEPTIONS // enable exceptions instead of error-codes
#define CL_TARGET_OPENCL_VERSION 120
#include <CL/cl.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>
using namespace cv;
using namespace std;
int main()
{
// OPENCL STUFF
// Very simplified/basic/stupid/naive OpenCL context creation
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
assert(platforms.size()>0);
std::vector<cl::Device> devices;
platforms[0].getDevices( CL_DEVICE_TYPE_ALL, &devices);
assert(devices.size()>0);
cl_context_properties prop[3] =
{
CL_CONTEXT_PLATFORM,
(cl_context_properties)(platforms[0])(),
0
};
cl::Context context( devices[0], prop, nullptr, nullptr);
std::string kernelStr = R"DELIMITER(
kernel void replaceRB( global uchar3* content)
{
const size_t globalId = get_global_id(0);
private uchar3 byte = content[globalId];
char aux = byte.z;
byte.z = byte.x;
byte.x = aux;
content[globalId] = byte;
}
)DELIMITER";
cl::Program::Sources sources;
sources.push_back(std::make_pair<const char*, size_t>(kernelStr.data(), kernelStr.size()));
cl::Program program(context, sources);
try
{
program.build({devices[0]}, "");
}
catch (...)
{
std::cout << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(devices[0]) << std::endl;
}
std::vector<cl::Kernel> kernels;
program.createKernels(&kernels);
assert(kernels.size()>0);
cl::CommandQueue queue(context, devices[0]);
// OPENCV STUFF
ocl::setUseOpenCL(true);
cv::ocl::attachContext(platforms[0].getInfo<CL_PLATFORM_NAME>(), platforms[0](), context(), devices[0]());
assert(ocl::haveOpenCL());
cout << cv::ocl::Context::getDefault().ndevices() << " GPU devices are detected." << endl;
VideoCapture cap(0); //Camera
//VideoCapture cap("SampleVideo_1280x720_1mb.mp4"); //Video example
assert(cap.isOpened());
UMat frame;
assert(cap.read(frame));
//MIX OF BOTH opencl and opencv
//cl::Buffer buf(context,CL_MEM_READ_WRITE, 256); // This works
cl::Buffer buf(*((cl_mem*)frame.handle(CL_MEM_READ_WRITE)));
int result = kernels[0].setArg(0, buf);
std::cout << result << " == " << CL_INVALID_MEM_OBJECT << std::endl;
queue.enqueueNDRangeKernel(kernels[0], cl::NullRange, cl::NDRange(16), cl::NDRange(4));
queue.flush();
//DISPLAY RESULT?
string window_name = "Test OpenCV and OpenCL";
namedWindow(window_name);
imshow(window_name, frame);
waitKey(5000);
return 0;
}
cv::UMat and the opencv 'transparent API' are very non intuitive to work with mainly because they hide the very important task of actual memory management from the client.
Specifically, in your code you provide an empty cv::UMat to cap::read. opencv will have to allocate memory for the actual frame. But it is not guaranteed that this memory will actually be allocated on the proper device (clbuffer) memory. I would not be surprised if you debug the opencv source, you will see the actual memory allocated on the RAM. hence no valid cl_mem handle.
You basically have 2 options:
option 1: pre allocate cv::UMat on the device explicitly:
UMat frame = cv::UMat(cv::Size(width, height), format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
assert(cap.read(frame));
options 2: wrap an exisiting pre allocated opencl buffer with a cv::UMat
cv::UMat frame;
cv::ocl::convertFromBuffer(
my_cl_mem,
pitch,
rows,
cols,
format,
frame
);
Also, because opencv way of working with opencl is a complete inefficient mess, I would not be surprised if providing a pre-pinned host memory to cap::read and later transfer it asynchronously to the device, will be more efficient. Note that you can have any host memory pointer wrapped with a cv::Mat.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With