Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are OpenCL work groups 3 dimensional?

Tags:

opencl

I am frustrated by this architecture since there is no obvious explanation why work groups should be 3 dimensional or I just haven't found the explanation yet. Since any number of dimensions can be emulated from one dimensional work groups it just seems like it adds extra complexity and makes it harder than it already is to understand the best way to divide your work into work groups.

For example, this person discovered that switching axis sped up his execution with a factor of two.

One hypothesis I have is that OpenCL wants a trivial relationship between the work item id and memory lookup to allow predictable memory operations that can be I/O optimized.

like image 936
Hannes Landeholm Avatar asked Dec 24 '22 11:12

Hannes Landeholm


2 Answers

Work groups don't have to be three dimensional if your application/algorithm does not require it. You can specify 1, 2, or 3 dimensions -- and no doubt more in the future. So use fewer dimensions when is naturally suits your application.

So why would the specification allow for more dimensions? Like you pointed out, the higher dimensions can be emulated using a single dimension. One example would be a 3-dinensional N-Body simulation, for physics/molecular simulation.

One huge advantage of choosing to use 3D work groups is reducing the code complexity by a fair bit. Under the hood, the SDK you're running openCL on may be doing the emulation for you.

As for the 2x performance gain in your example: this boost was a result of a much better memory access pattern, rather than the hardware inherently being terrible at running on a 2D work group. The answer to that question explains ways to further optimize the kernel, which are great strategies for today's gpu hardware.

A more subtle benefit of using 3D work groups is that future hardware might not need to emulate the extra dimensions. Perhaps the memory, processor, etc would be tailored to 3D work groups, and reduce or eliminate the penalty for bad memory access patterns. If you write your code using 1D groups, you will miss out on a potential performance boost on these platforms. Even today it is possible to create FPGA/ASIC chips to handle 3D work groups better than GPUs.

like image 115
mfa Avatar answered Feb 23 '23 18:02

mfa


What really tells you that only 3 dimensions are allowed?

clEnqueueNDRangeKernel() uses an unsigned integer to specify the number of dimensions, and uses an array of unsigned integers for each dimension size.

The OpenCL spec states that the maximum number of dimension is implementation defined as the constant CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, which is in practice often 3, but could be anything. It's just a matter of convenience, as most computational problems operate on "real world" data, which has between 1 to 3 dimensions.

Also, nobody forces you to use 3. Most applications use 1 and 2, and work perfectly fine.


If you are thinking why N and not always 1, you will understand it when you have to use local memory. It is terribly easier to use local memory in an image when the work group is in 2D, since the work items cover a small rectangular zone of the image, instead of a line of it.

You can emulate it with clever index conversions, but using it as the API is designed, it is much easier and more readable.

like image 37
DarkZeros Avatar answered Feb 23 '23 18:02

DarkZeros