How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

Question

I'm probably misusing OpenCV by using it as wrapper to the official OpenCL C++ bindings so that I can launch my own kernels.

However, OpenCV does have classes like Program, ProgramSource, Kernel, Queue, etc. that seem to tell me that I can launch my own (even non-image-based) kernels with OpenCV. I am having trouble finding documentation out there for these classes, let alone examples. So, I took a stab at it so far:

#include <fstream>
#include <iostream>

#include "opencv2/opencv.hpp"
#include "opencv2/core/ocl.hpp"

#define ARRAY_SIZE 128

using namespace std;
using namespace cv;

int main(int, char)
{
    std::ifstream file("kernels.cl");
    std::string kcode(std::istreambuf_iterator<char>(file),
        (std::istreambuf_iterator<char>()));

    cv::ocl::ProgramSource * programSource;
    programSource = new cv::ocl::ProgramSource(kcode.c_str());

    cv::String errorMessage;
    cv::ocl::Program * program;
    program = new cv::ocl::Program(*programSource, NULL, errorMessage);

    cv::ocl::Kernel * kernel;
    kernel = new cv::ocl::Kernel("simple_add", *program);
    /* I'm stuck here at the args. */

    size_t globalSize[2] = { ARRAY_SIZE, 1 };
    size_t localSize[2] = { ARRAY_SIZE, 1 };    
    kernel->run(ARRAY_SIZE, globalSize, localSize, true);

    return 0;
}

Note that I haven't set up my host variables yet. I'm stuck at kernel->args(...). There are 15 overloads and none of them specify what order I should specify the following, per argument:

The parameter index, so I manually match the parameter in the order given in the kernel.
The host variable itself.
The host variable's array size - typically I say something like sizeof(int) * ARRAY_SIZE, though I used to specify that on the clEnqueueWriteBuffer function in plain OpenCL.
Device buffer memory access, for example CL_MEM_READ_ONLY

It doesn't look like I call enqueueWriteBufer(...), enqueueNDRangeKernel(...), or enqueueReadBuffer(...) because (I guess) the kernel->run() does all of that for me under the hood. I assume that kernel->run() will write the new values to my output parameter.

I didn't specify a command queue, device, or context. I think that there is only one command queue and one context, and the default device - all created under-the-hood and are accessible from these classes.

So again, how do I use the args function of the kernel?

user4588286 · Accepted Answer

Although I am not 100% sure, I figured out a way to do this. This example contains tips on how to pass/retrieve data to/from a custom kernel using cv::UMat, basic types (e.g. int/float/uchar), and Image2D.

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>

using namespace std;

void main()
{
    if (!cv::ocl::haveOpenCL())
    {
        cout << "OpenCL is not avaiable..." << endl;
        return;
    }
    cv::ocl::Context context;
    if (!context.create(cv::ocl::Device::TYPE_GPU))
    {
        cout << "Failed creating the context..." << endl;
        return;
    }

    // In OpenCV 3.0.0 beta, only a single device is detected.
    cout << context.ndevices() << " GPU devices are detected." << endl;
    for (int i = 0; i < context.ndevices(); i++)
    {
        cv::ocl::Device device = context.device(i);
        cout << "name                 : " << device.name() << endl;
        cout << "available            : " << device.available() << endl;
        cout << "imageSupport         : " << device.imageSupport() << endl;
        cout << "OpenCL_C_Version     : " << device.OpenCL_C_Version() << endl;
        cout << endl;
    }

    // Select the first device
    cv::ocl::Device(context.device(0));

    // Transfer Mat data to the device
    cv::Mat mat_src = cv::imread("Lena.png", cv::IMREAD_GRAYSCALE);
    mat_src.convertTo(mat_src, CV_32F, 1.0 / 255);
    cv::UMat umat_src = mat_src.getUMat(cv::ACCESS_READ, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
    cv::UMat umat_dst(mat_src.size(), CV_32F, cv::ACCESS_WRITE, cv::USAGE_ALLOCATE_DEVICE_MEMORY);

    std::ifstream ifs("shift.cl");
    if (ifs.fail()) return;
    std::string kernelSource((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
    cv::ocl::ProgramSource programSource(kernelSource);

    // Compile the kernel code
    cv::String errmsg;
    cv::String buildopt = cv::format("-D dstT=%s", cv::ocl::typeToStr(umat_dst.depth())); // "-D dstT=float"
    cv::ocl::Program program = context.getProg(programSource, buildopt, errmsg);

    cv::ocl::Image2D image(umat_src);
    float shift_x = 100.5;
    float shift_y = -50.0;
    cv::ocl::Kernel kernel("shift", program);
    kernel.args(image, shift_x, shift_y, cv::ocl::KernelArg::ReadWrite(umat_dst));

    size_t globalThreads[3] = { mat_src.cols, mat_src.rows, 1 };
    //size_t localThreads[3] = { 16, 16, 1 };
    bool success = kernel.run(3, globalThreads, NULL, true);
    if (!success){
        cout << "Failed running the kernel..." << endl;
        return;
    }

    // Download the dst data from the device (?)
    cv::Mat mat_dst = umat_dst.getMat(cv::ACCESS_READ);

    cv::imshow("src", mat_src);
    cv::imshow("dst", mat_dst);
    cv::waitKey();
}

Below is a "shift.cl" file.

__constant sampler_t samplerLN = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;
__kernel void shift(
   __global const image2d_t src,
   float shift_x,
   float shift_y,
   __global uchar* dst,
   int dst_step, int dst_offset, int dst_rows, int dst_cols)
{
   int x = get_global_id(0);
   int y = get_global_id(1);
   if (x >= dst_cols) return;
   int dst_index = mad24(y, dst_step, mad24(x, (int)sizeof(dstT), dst_offset));
   __global dstT *dstf = (__global dstT *)(dst + dst_index);
   float2 coord = (float2)((float)x+0.5f+shift_x, (float)y+0.5f+shift_y);
   dstf[0] = (dstT)read_imagef(src, samplerLN, coord).x;
}

The point is the use of UMat. We recieve 5 parameters in the kernel (*data_ptr, int step, int offset, int rows, int cols) with KernelArg::ReadOnly(umat); 3 (*data_ptr, int step, int offset) with KernelArg::ReadOnlyNoSize(umat); and only 1 (*data_prt) with KernelArg::PtrReadOnly(umat). This rule is the same for WriteOnly and ReadWrite.

The step and offset are required when accessing the data array, since UMat may not be dense matrix due to the memory-address alignment.

cv::ocl::Image2D can be constructed from an UMat instance, and can be directly passed to kernel.args(). With image2D_t and sampler_t, we can benefit from GPU's hardware texture-units for linear-interpolation sampling (with real-valued pixel coordinates).

Note that the "-D xxx=yyy " build-option offers text replacement from xxx to yyy in the kernel code.

You can find more codes at my post: http://qiita.com/tackson5/items/8dac6b083071d31baf00

How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

Tags:

c++

opencv

gpgpu

opencl

Mickael Caruso

1 Answers

user4588286

Recent Activity

Donate For Us

How to launch custom OpenCL kernel in OpenCV (3.0.0) OCL?

Tags:

c++

opencv

gpgpu

opencl

Mickael Caruso

1 Answers

user4588286

Related questions

Recent Activity

Donate For Us