I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry.
How can I do this within an OpenCL kernel process?
I can already:
identify those points that fall outside the triangle (simple point in poly algorithm in the kernel)
pass their coordinates to a global output array.
But:
an openCL global output array cannot be variable and so I initialise it to match the input array of points in terms of size
As a result, 0,0 points occur in the final output when a point falls within the triangle
The output array therefore does not result in any reduction per se.
Can the 0,0 points be deleted within the openCL context?
n.b. I am coding in OpenFrameworks, so c++ implementations are linking to .cl files
Just an alternative for the case where most of the points fall inside the atomic condition:
It is possible to have a local counter, and local atomic. Then to merge that atomic to the global value it is possible to use atomic_add(). Witch will return the "previous" global value. So, you just copy the indexes to that address and up.
It should be a noticeable speed up, since the threads will sync locally and only once globally. The global copy can be parallel since the address will never overlap.
For example:
__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
int lid = get_local_id(0);
int lws = get_local_size(0);
int idx = get_global_id(0);
__local int local_count;
__local int global_val;
//I am using a local container, but a local array of pointers to global is possible as well
__local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
if(lid==0){
local_count = 0; global_val = -1;
}
barrier(CLK_LOCAL_MEM_FENCE);
//Classify them
if(global_in[idx] == ....)
local_out[atomic_inc(local_count)] = global_in[idx];
barrier(CLK_LOCAL_MEM_FENCE);
//If not, we are done
if(local_count > 0){
//Only the first local ID does the atomic to global
if(lid == 0)
global_val = atomic_add(global_count,local_count);
//Resync all the local workers here
barrier(CLK_LOCAL_MEM_FENCE);
//Copy all the data
for(int i=0; i<local_count; i+=lws)
global_out[global_val+i] = local_out[i];
}
}
NOTE: I didn't compile it but should more or less work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With