I have searched other threads on usage of L2 cache in CUDA. But, unable to find the solution. How do i make use of L2 Cache? Is there any invoking function or declaration for its use? Like, for using shared memory, we use __device__ __shared__. Is there anything like that for L2 Cache??
The L2 cache is transparent to device code. All accesses to memory (global, local, surface, texture, constant, and instruction) that do not hit in L1 go to L2. All writes go through L2.
CUDA C Programming Guide F.4.2 : Global Memory
This sections provides a few more details on L2.
The compiler flag -dlcm=cg can be used to make global accesses be uncached in L1 and cached in L2.
CUDA C Programming Guide B.5 : Memory Fence Functions
The function __threadfence() can be used to make sure that all writes to global memory are visible in L2.
The function __threadfence_system() can be used to make sure that all writes to global memory are visible to host threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With