How does L2 cache work in GPUs with Kepler architecture in terms of locality of references? For example if a thread accesses an address in global memory, supposing the value of that address is not in L2 cache, how is the value being cached? Is it temporal? Or are other nearby values of that address brought to L2 cache too (spatial)?
Below picture is from NVIDIA whitepaper.
Unified L2 cache was introduced with compute capability 2.0 and higher and continues to be supported on the Kepler architecture. The caching policy used is LRU (least recently used) the main intention of which was to avoid the global memory bandwidth bottleneck. The GPU application can exhibit both types of locality (temporal and spatial).
Whenever there is an attempt read a specific memory it looks in the cache L1 and L2 if not found, then it will load 128 byte from the cache line. This is the default mode. The same can be understood from the below diagram as to why the 128 bit access pattern gives the good result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With