How can i get the global threadId in 2 dimensions in OpenCL? I know that for 1 dimension, the formula is:
int global_id = get_global_id(1) * get_global_size(0) + get_global_id(0);
But if i allocate like this:
size_t block_size[] = {2,2}
size_t grid_size[] = {35,20}
The above formula fails, giving indexes only from 0 to 35*20. The indexes should go from 0 to 35*40*2*2.
Can you recommend any good documentation or writings that could give me the intuition to understand how all of this works? Thanks!
If you're launching a 2D NDRange, then get_global_id(0) and get_global_id(1) will give you the Gx and Gy indices. You can also independently fetch the local ids using get_local_id(0/1).
There's no need to calculate it yourself.
Did you mean that you're launching a 2D thread block but want to map that thread to a position in a 1 dimensional buffer?
EDIT: After reading your comment, I thought an explanation is in order.
OpenCL launches as many kernels as get_global_size(0) * get_global_size(1) (which is 35 * 20), so you will have threads
(0 ,0) (0 ,1) ... (0,34)
(1 ,0) (1 ,1) ... (1,34)
.
.
.
(19,0) (19,1) ... (19,34)
Local worksize is simply a way of splitting up the total number of threads and distributing them across the compute units available. It is quite possible that only 2 * 2 = 4 threads are running at any point of time.
The clEnqueueNDRangeKernel documentation tells us that local_work_size can be null, in which case the implementation will determine the size to break up the total amount of work.
In no way does local work size increase the number of threads.
Perhaps this image explains it better than I can.

Note that the total number of kernel launches are still get_global_size(0) * get_global_size(1).
If you want your 1D indices to go from 0..(35*40*2*2 - 1) then launch the kernel so that get_global_size(0) * get_global_size(1) is 35*40*2*2 (perhaps 70 x 80 ?)
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With