I'm not quite clear of the actual meaning of CL_DEVICE_LOCAL_MEM_SIZE
, which is acquired through clGetDeviceInfo
function. Is this value indicating the total sum of all the available local memory on a certain device, or the up-limit of local memory share to a work-group?
TL;DR: Per single processing unit, hence also the maximum allotable to a work unit.
This value is the amount of local memory available on each compute unit in the device. Since a work-group is assigned to a single compute unit, this is also the maximum amount of local memory that any work-group can have.
For performance reasons on many GPUs, it is usually desirable to have multiple work-groups running on each compute unit concurrently (to hide memory access latency, for example). If one work-group uses all of the available local memory, the device will not be able to schedule any other work-groups onto the same compute unit until it has finished. If possible, it is recommended to limit the amount of local memory each work-group uses (to e.g. a quarter of the total local memory) to allow multiple work-groups to run on the same compute unit concurrently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With