I understand that shared memory on GPU does not persist across different kernels. However, does the L1 cache persist across different kernel calls?
GPUs provide high-bandwidth/low-latency on-chip shared memory and L1 cache to efficiently service a large number of concurrent memory requests. Specifically, concurrent memory requests accessing contiguous memory space are coalesced into warp-wide accesses.
Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.
__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU).
This means that data that is processed by the GPU must be moved from the CPU to the GPU before the computation starts, and the results of the computation must be moved back to the CPU once processing has completed.
The SM L1 cache is invalidated between all operations on the same stream or the null stream to guarantee coherence. But it doesn't really matter, because the L1 cache on GPUs is not really designed to improve temporal locality within a given thread of execution. On a massively parallel processor, it is parallel spatial locality that matters. What this means is that you want threads that are executing nearby to each other to access data that are nearby to each other.
When a cached memory load is performed, it is done for a single warp, and the cache stores cache line(s) that are accessed by threads in that warp (ideally only a single line). If the next warp accesses the same cache line(s), then the cache will hit and latency will be reduced. Otherwise, the cache will be updated with different cache lines. If memory accesses are very spread out, then later warps will probably evict cache lines from earlier warps before they get reused.
By the time another kernel runs, it is not likely for the data in the cache to be valid because many warps are likely to have been run by that SM for the previous kernel, so it doesn't really matter if it persists.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With