I'm developing one CUDA app where kernel has to go to global memory many times. This memory is accessed by all CTAs randomly (no locality, so cannot use shared memory). I need to optimize it. I heard that texture memory can alleviate this problem but can kernel read and write into texture memory? 1D texture memory? 2D texture memory? Also what about CUDA arrays?
CUDA Textures are read only. Texture reads are cached. So performance gain is probabilistic.
CUDA Toolkit 3.1 onwards also have writeable textures known as Surfaces, but they are available only for devices with Compute Capability >=2.0. Surfaces are just like textures but the advantage is that they can also be written by the kernel.
Surfaces can only be bound to cudaArray
created with flag cudaArraySurfaceLoadStore
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With