Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCl maximum work-items per compute unit

Tags:

c++

opencl

I am writing an OpenCL code to find an optimum work-group size to have maximum occupancy on GPU. For this, I want a function that returns the maximum number of work-items per compute unit.

Basically, I am deriving this from a CUDA code and I want an equivalent of maxThreadsPerMultiProcessor. In CUDA these were the values returned on device query: Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024

In OpenCL: CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024

In CUDA it doesn't asks for kernel info to return this value. I need an equivalent function for OpenCL. Thanks in advance.

like image 459
Shailesh Tripathi Avatar asked Apr 17 '26 10:04

Shailesh Tripathi


1 Answers

To get the maximum number of work-items per compute unit, use clGetDeviceInfo() together with the CL_DEVICE_MAX_WORK_GROUP_SIZE flag. This will return the "maximum number of work-items in a work-group that a device is capable of executing on a single compute unit", which is pretty much what you want.

The optimum work-group size depends not only on the device, but also on the specific kernel being used. For this purpose you can use the clGetKernelWorkGroupInfo() function with the CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE flag. This "returns the preferred multiple of workgroup size for launch", which is a "performance hint".

If you're using the C++ bindings, use the getInfo() and getWorkGroupInfo() methods with device and kernel objects, respectively.

like image 149
faken Avatar answered Apr 18 '26 23:04

faken



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!