OpenCl maximum work-items per compute unit

Question

I am writing an OpenCL code to find an optimum work-group size to have maximum occupancy on GPU. For this, I want a function that returns the maximum number of work-items per compute unit.

Basically, I am deriving this from a CUDA code and I want an equivalent of maxThreadsPerMultiProcessor. In CUDA these were the values returned on device query: Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024

In OpenCL: CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024

In CUDA it doesn't asks for kernel info to return this value. I need an equivalent function for OpenCL. Thanks in advance.

faken · Accepted Answer

To get the maximum number of work-items per compute unit, use clGetDeviceInfo() together with the CL_DEVICE_MAX_WORK_GROUP_SIZE flag. This will return the "maximum number of work-items in a work-group that a device is capable of executing on a single compute unit", which is pretty much what you want.

The optimum work-group size depends not only on the device, but also on the specific kernel being used. For this purpose you can use the clGetKernelWorkGroupInfo() function with the CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE flag. This "returns the preferred multiple of workgroup size for launch", which is a "performance hint".

If you're using the C++ bindings, use the getInfo() and getWorkGroupInfo() methods with device and kernel objects, respectively.

OpenCl maximum work-items per compute unit

Tags:

c++

opencl

Shailesh Tripathi

1 Answers

faken

Recent Activity

Donate For Us

OpenCl maximum work-items per compute unit

Tags:

c++

opencl

Shailesh Tripathi

1 Answers

faken

Related questions

Recent Activity

Donate For Us