Does it make sense to rewrite the code so that it loads data trough texture cache ( assuming that i don't need filtering and other texture unit options) or is it the same? How about loading some data trough L1 cache and some trough texturing unit? i have a code where i could use such strategy but does it make sense at all?
To make it clear, i meant, "is the texture cache on FERMI a separate hardware from L1 cache hardware" - in other words, can I cleverly get a total of L1 + texture cache volume for my code?
It is separate. A texture load does not go through L1. For non texturing applications (ie. you're not using features like interpolation and clamping) the principal benefit of texturing is that it allows you to selectively add a large amount of global memory that can be potentially cached (assuming locality and reuse) without disrupting what is going on in L1. For small data sets, texturing will not give better perf than L1. For large data sets, where there is some locality and reuse, but the loads from the region that is covered by the texture cache might otherwise thrash the L1 (which might be as small as 16KB per SM on Fermi, depending on cache config), the texture cache can provide a benefit to the application overall. It's not uncommon for users to experience that using texture isn't quite as fast as if things can be cached in L1, but a lot faster than uncached loads, or scattered loads that thrash the L1. A lot will depend on the access pattern and sizes of data involved. The size of the texture cache is on the order of 8KB per SM. You can cache a much larger region, but a high level of reuse and locality will definitely improve the performance of texture cache. Also note that texture memory is read-only. You might be interested in this webinar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With