I have seen in one post here that we can call a function from an OpenCL kernel. But in my situation, I need that complex function to be parallelized (run by all available threads) as well, so do I have to make that function a kernel too and call it straight away like function from the main kernel ? or whats possible solution for this situation? Thanks in advance
You can call helper functions from your kernel and they will be parallelized in the same manner as the kernel, imagine them as inlined inside your kernel code. So, each work item will invoke the helper function for the working set it handles.
float4 helper_function(float4 input)
{
return input.x + input.y + input.z + input.w;
}
__kernel kernel_function(const float4* arr, float4* out)
{
id = get_global_id(0);
out[id] = helper_function(arr[id]);
}
OpenCL 2.0 spec added a new feature for dynamic paralelism.
6.13.17 Enqueuing Kernels
OpenCL 2.0 allows a kernel to independently enqueue to the same device, without host
interaction. ...
In the example below my_func_B enqueus my_func_A on the device:
kernel void
my_func_A(global int *a, global int *b, global int *c)
{
...
}
kernel void
my_func_B(global int *a, global int *b, global int *c)
{
ndrange_t ndrange;
// build ndrange information
...
// example – enqueue a kernel as a block
enqueue_kernel(get_default_queue(), ndrange, ^{my_func_A(a, b, c);});
...
}
If I understand your question correctly, you want to do a separate full pass over a buffer from inside the kernel. I don't think that is possible from within the kernel, so you'd have to create the code for the "inner" pass as a separate kernel and also call that kernel separately from your host code. The output from that kernel doesn't have to be read back to the host memory, but can stay in device memory between your kernel calls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With