Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to effectively swap OpenCL memory buffers?

Exactly as the title suggests I am looking for how to effectively swap two OpenCL buffers. My kernel uses two gloabl buffers, one as input and one as output. However, I invoke my kernel in a for loop with the same NDRange, each time setting the kernel arguments, enqueueing the kernel, and swapping the buffers because the previous output buffer will be the input buffer seed for the next iteration.

What is the appropriate way here, to swap these two buffers? I imagine that copying the buffer back to the host to one of the already malloc'd arrays and copying it into the next input buffer using clEnqueueWriteBuffer() and clEnqueueReadBuffer() is an inefficient way to go. Otherwise I am just using a temporary cl_mem variable to do my swapping.

like image 777
voxeloctree Avatar asked Jun 14 '12 20:06

voxeloctree


2 Answers

You don't need to, just set the right kernel args using clSetKernelArg before enqueuing your kernel a second time (using clEnqueueNDRangeKernel). The buffers will stay on the device, nothing will be copied back to the host.

Your buffer has to be created with CL_MEM_READ_WRITE in this case of course.

like image 121
Simon Avatar answered Oct 15 '22 18:10

Simon


As the previous answer: No, you don't need to swap buffers at all.

However, I don't agree with the proposed answer. The function clSetKernelArg() is not thread safe, and is not designed to be called in the operation loop.

The proper solution is to create 2 kernels created with the same program and source. This approach is more aligned with the OpenCL programming philosophy "One kernel for one task". Having many kernels with the same code but different arguments is the way to go.

The first kernel will have:

kernel1 = clCreateKernel(program, "mykernel", NULL);
clSetKernelArg(kernel1, 0, &buff1);
clSetKernelArg(kernel1, 1, &buff2);

And the other one will be:

kernel2 = clCreateKernel(program, "mykernel", NULL);
clSetKernelArg(kernel2, 0, &buff2);
clSetKernelArg(kernel2, 1, &buff1);

This way, you don't need to stop the execution each iteration. You can simply run:

for(int it=0; it<iter; it++){
    clEnqueueNDRangeKernel(it%2 ? kernel1 : kernel2, ....);
}
clFinish(command);

This approach will be surely better than changing the kernel args, more efficient, less API calls. Additionally on some systems, clSetKernelArgs() may be a blocking call, due to poor API implementations. So it is better to avoid them as much as possible.

like image 38
DarkZeros Avatar answered Oct 15 '22 19:10

DarkZeros