I would like to understand what the difference is when I create a buffer with read-only property and use it with __constant
address space qualifier in the kernel or use it with const __global
address space qualifier.
What I have already found those are not really answers for my question but they contain some useful information:
https://community.khronos.org/t/constant-vs-const-global
Is the access performance of __constant memory as same as __global memory on OpenCL
Using __constant qualifer in OpenCL kernels
If I understand well the allocation in GPU's memory happens at clCreateBuffer function call. So what I do not understand is how the compiler decides the buffer is in the constant memory (which has a 64 KB limit) or in the global memory. (I know that in most of the cases the constant memory is a part of the global memory space.) If it depends on the address qualifier that means the 64 KB limit can be ignored using const __global
.
Is there any difference in performance between __constant
and const __global
? The __global memory may be cached so both of them is read-only and (may be) cached.
(Source: 3.3 Memory Model/Global memory section and Figure 3.3; http://www.khronos.org/registry/cl/specs/opencl-1.x-latest.pdf#page=24)
Based on my experiences there is no conceptual difference between the two, they both imply that the data that is pointed to is read only. The difference is only then apparent depending on the implementation used by the vendor.
For example on nvidia GPUs memory marked with __constant is cached (the size of the cache is 8KB per multiprocessor I believe for all current devices). One thing to note is accesses to this cache are serialized if different work items access different addresses and thus I've found it most useful for passing structs of parameters that are constant within a work group. If you look at the section on constant memory in the CUDA programming guide you'll get a better idea as to how this works. Memory marked const __global is not cached I believe, it simply tells the compiler to throw an error if you try to change pointed to values.
I'm not sure whether AMD do a similar kind of caching on their hardware
Hope that helps
For Intel's (formerly Altera's) SDK for OpenCL on FPGAs, constant memory is loaded into on-chip constant cache which is shared by all work-groups. The size of this cache is 16 KB by default, but can be changed if you add the -const-cache-bytes=<N>
(with <N>
being the the constant cache size in bytes) flag to your aoc
command. Their Best Practices Guide page 138 also mentions the following:
Unlike global memory accesses that have extra hardware for tolerating long memory latencies, the constant cache suffers large performance penalties for cache misses. If the
__constant
arguments in your OpenCL kernel code cannot fit in the cache, you might achieve better performance with__global const
arguments instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With