Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass and access C++ vectors to OpenCL kernel?

Tags:

c++

c

opencl

I'm new to C, C++ and OpenCL and doing my best to learn them at the moment. Here's a preexisting C++ function that I'm trying to figure out how to port to OpenCL using either the C or C++ bindings.

#include <vector>

using namespace std;

class Test {

private:

    double a;
    vector<double> b;
    vector<long> c;
    vector<vector<double> > d;

public:

    double foo(long x, double y) {
        // mathematical operations
        // using x, y, a, b, c, d
        // and also b.size()
        // to calculate return value
        return 0.0;
    }

};

Broadly my question is how to pass in all the class members that this function accesses into the binding and the kernel. I understand how to pass in the scalar values but the vector values I'm not sure about. Is there perhaps a way to pass in pointers to each of the above members or memory map them so that OpenCL's view of them is in sync with host memory? Broken down my questions are as below.

  1. How do I pass in member b and c to the binding and the kernel given that these are of variable size?
  2. How do I pass in member d given that it is two dimensional?
  3. How do I access these members from within the kernel and what types will they be declared as in the arguments to the kernel? Will simply using array index notation i.e. b[0] work for access?
  4. How would I invoke an operation equivalent to b.size() within the kernel function or would I not and instead pass in the size from the binding into the kernel as an extra argument? What happens if it changes?

I would really appreciate either C or C++ binding and kernel code example source code in answers.

Many thanks.

like image 504
junkie Avatar asked Sep 14 '12 13:09

junkie


1 Answers

  1. You have to allocate an OpenCL buffer and copy your CPU data into it. An OpenCL buffer has a fixed size, so you either have to recreate it if your data size changes or you make it "big enough" and use only a subsection of it if less memory is needed. For example, to create a buffer for b and at the same time copy all of its data to the device:

    cl_mem buffer_b = clCreateBuffer(
        context, // OpenCL context
        CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, // Only read access from kernel,
                                                 // copy data from host
        sizeof(cl_double) * b.size(), // Buffer size in bytes
        &b[0], // Pointer to data to copy
        &errorcode); // Return code
    

    It is also possible to directly map host memory (CL_MEM_USE_HOST_PTR), but this imposes some restrictions on the alignment and the access to the host memory after creating the buffer. Basically, the host memory can contain garbage when you are not currently mapping it.

  2. It depends. Are the sizes of the vectors in the second dimension consistenly equal? Then just flatten them when uploading them to the OpenCL device. Otherwise it gets more complicated.

  3. You declare buffer arguments as __global pointers in your kernel. For example, __global double *b would be appropiate for the buffer created in 1. You can simply use array notation in the kernel to access the individual elements in the buffer.

  4. You cannot query the buffer size from within the kernel, so you have to pass it manually. This can also happen implicitly, e.g. if the number of work items matches the size of b.

A kernel which can access all of the data for the computation could look like this:

__kernel void foo(long x, double y, double a, __global double* b, int b_size,
                  __global long* c, __global double* d,
                  __global double* result) {
  // Here be dragons
  *result = 0.0;
}

Note that you also have to allocate memory for the result. It might be necessary to pass additional size arguments should you need them. You would call the kernel as follows:

// Create/fill buffers
// ...

// Set arguments
clSetKernelArg(kernel, 0, sizeof(cl_long), &x);
clSetKernelArg(kernel, 1, sizeof(cl_double), &y);
clSetKernelArg(kernel, 2, sizeof(cl_double), &a);
clSetKernelArg(kernel, 3, sizeof(cl_mem), &b_buffer);
cl_int b_size = b.size();
clSetKernelArg(kernel, 4, sizeof(cl_int), &b_size);
clSetKernelArg(kernel, 5, sizeof(cl_mem), &c_buffer);
clSetKernelArg(kernel, 6, sizeof(cl_mem), &d_buffer);
clSetKernelArg(kernel, 7, sizeof(cl_mem), &result_buffer);
// Enqueue kernel
clEnqueueNDRangeKernel(queue, kernel, /* ... depends on your domain */);

// Read back result
cl_double result;
clEnqueueReadBuffer(queue, result_buffer, CL_TRUE, 0, sizeof(cl_double), &result,
                    0, NULL, NULL);
like image 143
reima Avatar answered Sep 21 '22 18:09

reima