Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Global Work Size Need to be Multiple of Work Group Size in OpenCL?

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL?

If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities:

Dynamically set the size of the work group dimensions to a factor of the global work dimensions. (this would incur the overhead of finding a factor and possibly set the work group to a non-optimal size.)

Increase the dimensions of the global work to be the nearest multiple of the work group dimensions, keeping all input and output buffers the same but checking bounds in the kernel to avoid segfaulting, i.e. do nothing on the work items out of bound of the desired output. (This seems like the better way.)

Would the second way work? Is there a better way? (Or is it not necessary because work group dimensions need not divide global work dimensions?)

Thanks!

like image 538
Junier Avatar asked Jun 30 '10 09:06

Junier


People also ask

What is work group size in OpenCL?

I read somewhere (for the case in which we don't specify the local work size) that openCL creates 3 work groups(of 217 work-items each) for kernel with 651 work-items(divisible by 3) while it creates 653 work-groups of 1 work-item each, as 653 is a prime number.

What is work group OpenCL?

Work-items Each work-item in OpenCL is a thread in terms of its control flow, and its memory model. The hardware may run multiple work-items on a single thread, and you can easily picture this by imagining four OpenCL work-items operating on the separate lanes of an SSE vector.


1 Answers

Thx for the link Chad. But actually, if you read on:

If local_work_size is specified, the values specified in global_work_size[0], … global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0], … local_work_size[work_dim – 1].

So YES, the local work size must be a multiple of the global work size.

I also think the assigning the global work size to the nearest multiple and being careful about bounds should work, I'll post a comment when I get around to trying it.

like image 85
Junier Avatar answered Sep 30 '22 16:09

Junier