I'm trying to learn OpenCL but I'm a having a hard time deciding which address spaces to use, as I only find assembled resources declaring what these address spaces are, but not why they exist or when to use them. The resources are at least too scattered, so with this question I hope to assemble all this information: what are all the address spaces, why do they exist, when to use which address space and what are the advantages and disadvantages regarding memory and performance.
As I understand it (which is probably too simplified), the GPU has two physical types of memory: global memory, far from the actual processors, so slow but pretty big and available to all workers, and local memory, close to the actual processors, so fast but small and not accessible from other workers.
Intuitively, the local
qualifier makes sure a variable is placed on local memory and the global
qualifier makes sure a variable is placed on global memory, though I'm not sure this is exactly what happens. This leaves the private
and constant
qualifiers. What's the purpose of those?
There also are some implicit qualifiers. For example, the specifications mention the generic address space, which is used for arguments with no qualifiers, I think. What does this do exactly? Then there also are local function variables. What's the address space for those?
Here is an example using my intuition, but without knowing what I'm actually doing:
Example:
Say I pass an array of type long
and length 10000 to a kernel which I will only use to read, then I would declare it global const
as it must be available to all workers and it will not change. Why wouldn't I use the constant
qualifier? When setting the buffer for this array via the CPU, I actually also just could have made the array read-only, which in my eyes says the same as declaring it const
. So again, when and why would I declare something constant
or global const
?
When performing memory-intensive tasks, would it be better to copy the array to a local array inside the kernel? My guess is that local memory would be too small, but what if the array only had a length of 10? When would the array be too big/small? More general: when is it worth copying data from global to local memory?
Say I also want to pass the length of this array, then I would add const int length
to the arguments of my kernel, but I'm unsure why I would omit the global
qualifier except because I have seen other people do it. After all, length
must be accessible for all workers. If I'm right, then length
would have a generic address space, but again, I don't really know what that means.
I hope someone with some experience can clear this up. That would be great not only for me, but I hope also for other enthusiasts who want to gain some practical knowledge concerning memory management on the GPU.
Constant: A small portion of cached global memory visible by all workers. Use it if you can, read only.
Global: Slow, visible by all, read or write. It is where all your data will end, so some accesses to it are always necessary.
Local: Do you need to share something in a local group? Use local! Do all your local workers access the same global memory? Use local! Local memory is only visible inside local workers, and is limited in size, however is very fast.
Private: Memory that is only visible to a worker, consider it like registers. All non defined values are private by default.
Say I pass an array of type long and length 10000 to a kernel which I will only use to read, then I would declare it global const as it must be available to all workers and it will not change. Why wouldn't I use the constant qualifier?
Actually, yes, you can and you should use constant
qualifier. Which places your data on the constant memory (a small portion of read only memory quickly accessible by all workers). This is used by GPUs to transfer uniforms to all vertex shaders.
When setting the buffer for this array via the CPU, I actually also just could have made the array read-only, which in my eyes says the same as declaring it const. So again, when and why would I declare something constant or global const?
Not really, when you create a buffer read only you are only specifiying to OpenCL you plan to use it read only, so it can do optimizations in the back, but you can actually write to it from a kernel.
global const
is just a safeguard for the developer, so you don't accidentally write to it, it will give an error at compile time.
Basically, the same as in plain C host side computing. Programs will also work fine if all memory is non-const.
When performing memory-intensive tasks, would it be better to copy the array to a local array inside the kernel? My guess is that local memory would be too small, but what if the array only had a length of 10? When would the array be too big/small? More general: when is it worth copying data from global to local memory?
It is only worth if it is read by all workers. If each worker reads a single value of the global memory, then it is not worth. Useful here:
Worker0 -> Reads 0,1,2,3
Worker1 -> Reads 0,1,2,3
Worker2 -> Reads 0,1,2,3
Worker3 -> Reads 0,1,2,3
Not useful here:
Worker0 -> Reads 0
Worker1 -> Reads 1
Worker2 -> Reads 2
Worker3 -> Reads 3
Say I also want to pass the length of this array, then I would add const int length to the arguments of my kernel, but I'm unsure why I would omit the global qualifier except because I have seen other people do it. After all, length must be accessible for all workers. If I'm right, then length would have a generic address space, but again, I don't really know what that means.
When you don't specify a qualifier in the kernel parameter it typically defaults to constant
, which is what you want for those small elements, to have a fast access by all workers.
The rules normally OpenCL compilers follow for kernel parameters is: if it only read and fits in constant, constant, otherwise global.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With