Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Share large constant data among cuda threads

Tags:

c++

cuda

opencl

I have a kernel which is called multiple times. In each call a constant data of around 240 kbytes will be shared and processed by threads. Threads work independently like a map function. The stalling time of the threads is considerable. The reason behind that can be the bank conflict of memory reads. How can I handle this?(I have GTX 1080 ti)

Can "const global" of opencl handle this? (because constant memory in cuda is limited to 64 kb)

like image 650
user3329035 Avatar asked May 21 '26 17:05

user3329035


1 Answers

In CUDA, I believe the best recommendation would be to make use of the so called "Read-Only" cache. This has at least two possible benefits over the __constant__ memory/constant cache system:

  1. It is not limited to 64kB like __constant__ memory is.
  2. It does not expect or require "uniform access" like the constant cache does, to deliver full access bandwidth/best performance. Uniform access refers to the idea that all threads in a warp are accessing the same location or same constant memory value (per read cycle/instruction).

The read-only cache is documented in the CUDA programming guide. Possibly the easiest way to use it is to decorate your pointers passed to the CUDA kernel with __restrict__ (assuming you are not aliasing between pointers) and to decorate the pointer that refers to the large constant data with const ... __restrict__. This will allow the compiler to generate appropriate LDG instructions for access to constant data, pulling it through the read-only cache mechanism.

This read-only cache mechanism is only supported on cc 3.5 or higher GPUs, but that covers some GPUs in the Kepler generation and all GPUs in the Maxwell, Pascal (including your GTX 1080 ti), Volta, and Turing generations.

If you have a GPU that is less than cc3.5, possibly the best suggestion for similar benefits (larger than __const__, not needing uniform access) then would be to use texture memory. This is also documented elsewhere in the programming guide, there are various CUDA sample codes that demonstrate the use of texture memory, and plenty of questions here on the SO cuda tag covering it as well.

like image 111
Robert Crovella Avatar answered May 23 '26 10:05

Robert Crovella



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!