I'm new to C, C++ and OpenCL and doing my best to learn them at the moment. Here's a preexisting C++ function that I'm trying to figure out how to port to OpenCL using either the C or C++ bindings.
#include <vector>
using namespace std;
class Test {
private:
double a;
vector<double> b;
vector<long> c;
vector<vector<double> > d;
public:
double foo(long x, double y) {
// mathematical operations
// using x, y, a, b, c, d
// and also b.size()
// to calculate return value
return 0.0;
}
};
Broadly my question is how to pass in all the class members that this function accesses into the binding and the kernel. I understand how to pass in the scalar values but the vector values I'm not sure about. Is there perhaps a way to pass in pointers to each of the above members or memory map them so that OpenCL's view of them is in sync with host memory? Broken down my questions are as below.
I would really appreciate either C or C++ binding and kernel code example source code in answers.
Many thanks.
You have to allocate an OpenCL buffer and copy your CPU data into it. An OpenCL buffer has a fixed size, so you either have to recreate it if your data size changes or you make it "big enough" and use only a subsection of it if less memory is needed. For example, to create a buffer for b
and at the same time copy all of its data to the device:
cl_mem buffer_b = clCreateBuffer(
context, // OpenCL context
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, // Only read access from kernel,
// copy data from host
sizeof(cl_double) * b.size(), // Buffer size in bytes
&b[0], // Pointer to data to copy
&errorcode); // Return code
It is also possible to directly map host memory (CL_MEM_USE_HOST_PTR
), but this imposes some restrictions on the alignment and the access to the host memory after creating the buffer. Basically, the host memory can contain garbage when you are not currently mapping it.
It depends. Are the sizes of the vectors in the second dimension consistenly equal? Then just flatten them when uploading them to the OpenCL device. Otherwise it gets more complicated.
You declare buffer arguments as __global
pointers in your kernel. For example, __global double *b
would be appropiate for the buffer created in 1. You can simply use array notation in the kernel to access the individual elements in the buffer.
You cannot query the buffer size from within the kernel, so you have to pass it manually. This can also happen implicitly, e.g. if the number of work items matches the size of b
.
A kernel which can access all of the data for the computation could look like this:
__kernel void foo(long x, double y, double a, __global double* b, int b_size,
__global long* c, __global double* d,
__global double* result) {
// Here be dragons
*result = 0.0;
}
Note that you also have to allocate memory for the result. It might be necessary to pass additional size arguments should you need them. You would call the kernel as follows:
// Create/fill buffers
// ...
// Set arguments
clSetKernelArg(kernel, 0, sizeof(cl_long), &x);
clSetKernelArg(kernel, 1, sizeof(cl_double), &y);
clSetKernelArg(kernel, 2, sizeof(cl_double), &a);
clSetKernelArg(kernel, 3, sizeof(cl_mem), &b_buffer);
cl_int b_size = b.size();
clSetKernelArg(kernel, 4, sizeof(cl_int), &b_size);
clSetKernelArg(kernel, 5, sizeof(cl_mem), &c_buffer);
clSetKernelArg(kernel, 6, sizeof(cl_mem), &d_buffer);
clSetKernelArg(kernel, 7, sizeof(cl_mem), &result_buffer);
// Enqueue kernel
clEnqueueNDRangeKernel(queue, kernel, /* ... depends on your domain */);
// Read back result
cl_double result;
clEnqueueReadBuffer(queue, result_buffer, CL_TRUE, 0, sizeof(cl_double), &result,
0, NULL, NULL);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With