Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is L1 cache used for in NVIDIA's maxwell GPUs?

Tags:

caching

cuda

NVIDIA has launched their maxwell GPUs for a while, yet while reading the "Maxwell Tuning Guide", I was confused about the functionality of the L1 cache. In Kepler era, global memory access was cached in L2 only and L1 is used to cache the local memory access, which is caused by register spilling. From reading NVIDIA's document, this local memory caching is the only thing I know that will benefit from L1 cache. However, in the section 1.4.2.1 of "Maxwell Tuning Guide", NVIDIA says:

As with Kepler, global loads in first-generation Maxwell are cached in L2 only ... Local loads also are cached in L2 only

CUDA 6.0 added two new device attributes globalL1CacheSupported and localL1CacheSupported to check if a device support global memory L1 cache and local memory L1 cache, so I did a test on these two attributes on both GTX 780 and GTX 980 cards, and the result makes me even more confused:

        globalL1CacheSupported    localL1CacheSupported

GTX780            1                         1

GTX980            0                         0

The result from GTX 980 verifies the statement in "Maxwell Tuning Guide", which puzzles me because if that is the case, then what is L1 cache used for ? Another thing that I cannot understand is GTX 780 is GK110 card, from GK110 white paper, Keper also caches its global memory access in L2 only, so it does not make sense to me that globalL1CacheSupported returns 1 for a GTX 780 card. Hope someone can clarify my puzzle.

like image 997
Xiangyu.Guo Avatar asked Mar 06 '15 08:03

Xiangyu.Guo


1 Answers

On Maxwell, the L1 functionality has been combined with the texture cache. This is referred to in the tuning guide as well.

Fermi devices introduced the L1, which was used for global and local load caching. L1 was a write-through cache, so it had relatively less impact on global and local stores.

With Kepler, L1 was disabled for global loads, but still in effect for local loads.

then what is L1 cache used for ?

With Maxwell, the default behavior of the L1 with respect to global loads is the same - they are not cached. However, you can "opt-in" to have the global loads cached in L1, as spelled out in the Maxwell tuning guide you have referred to:

"In a manner similar to Kepler GK110B, GM204 retains this behavior by default but also allows applications to opt-in to caching of global loads in its unified L1/Texture cache. The opt-in mechanism is the same as with GK110B: pass the -Xptxas -dlcm=ca flag to nvcc at compile time."

GK110B was the variant of GK110 that showed up in K40 devices. On K20/K20x, the L1 behavior was not modifiable (off for global loads). On K40, the default behavior of L1 was the same as K20/K20x. But the default behavior can be overridden to turn L1 on for global loads.

like image 68
Robert Crovella Avatar answered Nov 12 '22 21:11

Robert Crovella