Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCL: Difference between __constant memory and const __global memory

I would like to understand what the difference is when I create a buffer with read-only property and use it with __constant address space qualifier in the kernel or use it with const __global address space qualifier.

What I have already found those are not really answers for my question but they contain some useful information:

  • https://community.khronos.org/t/constant-vs-const-global

  • Is the access performance of __constant memory as same as __global memory on OpenCL

  • Using __constant qualifer in OpenCL kernels

If I understand well the allocation in GPU's memory happens at clCreateBuffer function call. So what I do not understand is how the compiler decides the buffer is in the constant memory (which has a 64 KB limit) or in the global memory. (I know that in most of the cases the constant memory is a part of the global memory space.) If it depends on the address qualifier that means the 64 KB limit can be ignored using const __global.

Is there any difference in performance between __constant and const __global? The __global memory may be cached so both of them is read-only and (may be) cached. (Source: 3.3 Memory Model/Global memory section and Figure 3.3; http://www.khronos.org/registry/cl/specs/opencl-1.x-latest.pdf#page=24)

like image 660
Balazs Koszegi Avatar asked Aug 01 '13 10:08

Balazs Koszegi


2 Answers

Based on my experiences there is no conceptual difference between the two, they both imply that the data that is pointed to is read only. The difference is only then apparent depending on the implementation used by the vendor.

For example on nvidia GPUs memory marked with __constant is cached (the size of the cache is 8KB per multiprocessor I believe for all current devices). One thing to note is accesses to this cache are serialized if different work items access different addresses and thus I've found it most useful for passing structs of parameters that are constant within a work group. If you look at the section on constant memory in the CUDA programming guide you'll get a better idea as to how this works. Memory marked const __global is not cached I believe, it simply tells the compiler to throw an error if you try to change pointed to values.

I'm not sure whether AMD do a similar kind of caching on their hardware

Hope that helps

like image 87
mcd40 Avatar answered Oct 10 '22 17:10

mcd40


For Intel's (formerly Altera's) SDK for OpenCL on FPGAs, constant memory is loaded into on-chip constant cache which is shared by all work-groups. The size of this cache is 16 KB by default, but can be changed if you add the -const-cache-bytes=<N> (with <N> being the the constant cache size in bytes) flag to your aoc command. Their Best Practices Guide page 138 also mentions the following:

Unlike global memory accesses that have extra hardware for tolerating long memory latencies, the constant cache suffers large performance penalties for cache misses. If the __constant arguments in your OpenCL kernel code cannot fit in the cache, you might achieve better performance with __global const arguments instead.

like image 2
lil' wing Avatar answered Oct 10 '22 19:10

lil' wing