Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCL select/delete points from large array

Tags:

opencl

I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry.

How can I do this within an OpenCL kernel process?

I can already:

  • identify those points that fall outside the triangle (simple point in poly algorithm in the kernel)

  • pass their coordinates to a global output array.

But:

  • an openCL global output array cannot be variable and so I initialise it to match the input array of points in terms of size

  • As a result, 0,0 points occur in the final output when a point falls within the triangle

  • The output array therefore does not result in any reduction per se.

Can the 0,0 points be deleted within the openCL context?

n.b. I am coding in OpenFrameworks, so c++ implementations are linking to .cl files

like image 524
sam_mcelhinney Avatar asked May 09 '26 12:05

sam_mcelhinney


1 Answers

Just an alternative for the case where most of the points fall inside the atomic condition:

It is possible to have a local counter, and local atomic. Then to merge that atomic to the global value it is possible to use atomic_add(). Witch will return the "previous" global value. So, you just copy the indexes to that address and up.

It should be a noticeable speed up, since the threads will sync locally and only once globally. The global copy can be parallel since the address will never overlap.

For example:

__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
   int lid = get_local_id(0);
   int lws = get_local_size(0);
   int idx = get_global_id(0);

   __local int local_count;
   __local int global_val;    
   //I am using a local container, but a local array of pointers to global is possible as well
   __local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
   if(lid==0){
      local_count = 0; global_val = -1;
   }
   barrier(CLK_LOCAL_MEM_FENCE);

   //Classify them
   if(global_in[idx] == ....)
       local_out[atomic_inc(local_count)] = global_in[idx];

   barrier(CLK_LOCAL_MEM_FENCE);

   //If not, we are done
   if(local_count > 0){
      //Only the first local ID does the atomic to global
      if(lid == 0)
         global_val = atomic_add(global_count,local_count);

      //Resync all the local workers here
      barrier(CLK_LOCAL_MEM_FENCE);

      //Copy all the data
      for(int i=0; i<local_count; i+=lws)
          global_out[global_val+i] = local_out[i];
   }
}

NOTE: I didn't compile it but should more or less work.

like image 86
DarkZeros Avatar answered May 11 '26 01:05

DarkZeros



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!