I am writing an OpenCL code to find an optimum work-group size to have maximum occupancy on GPU. For this, I want a function that returns the maximum number of work-items per compute unit.
Basically, I am deriving this from a CUDA code and I want an equivalent of maxThreadsPerMultiProcessor.
In CUDA these were the values returned on device query:
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
In OpenCL: CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
In CUDA it doesn't asks for kernel info to return this value. I need an equivalent function for OpenCL. Thanks in advance.
To get the maximum number of work-items per compute unit, use clGetDeviceInfo() together with the CL_DEVICE_MAX_WORK_GROUP_SIZE flag. This will return the "maximum number of work-items in a work-group that a device is capable of executing on a single compute unit", which is pretty much what you want.
The optimum work-group size depends not only on the device, but also on the specific kernel being used. For this purpose you can use the clGetKernelWorkGroupInfo() function with the CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE flag. This "returns the preferred multiple of workgroup size for launch", which is a "performance hint".
If you're using the C++ bindings, use the getInfo() and getWorkGroupInfo() methods with device and kernel objects, respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With