Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA: Is texture memory still useful to speed up access times for compute capability 2.x and newer?

Tags:

cuda

I'm writing an image processing app where I have to fetch pixel data in uncoalesced manner.

Initially I implemented my algorithm using global memory. Later I reimplemented it using texture memory. To my amazement it became slower! I thought, maybe something wrong with cudaMalloc/text1Dfetch style, so I changed it to cudaArray/tex2D. Nothing changed.

Then I stumbled upon Shane Cook's "CUDA Programming", where he wrote:

As compute 1.x hardware has no cache to speak of, the 6–8K of texture memory per SM provides the only method to truly cache data on such devices. However, with the advent of Fermi and its up to 48 K L1 cache and up to 768 K shared L2 cache, this made the usage of texture memory for its cache properties largely obsolete. The texture cache is still present on Fermi to ensure backward compati- bility with previous generations of code.

I have GeForce GT 620M (Fermi, compute cap. 2.1).

So I need some advice from professionals! Should I dig deeper into texture memory with its texture cache trying to optimize performance? Or I should better stick with global memory and L1/L2 cache?

like image 718
Antonio Avatar asked Oct 30 '13 15:10

Antonio


People also ask

When to use texture memory CUDA?

TextureMemory is cached and read-only. That are the differences to global memory. So use it if you need random access to a larger memory space that cannot be put into shared memory. In CUDA, there is no such thing as “texture memory”, just “textures”.

What is texture in CUDA?

A texture can be any region of linear memory or a CUDA array (described in Section 3.2. 10.2. 3 in CUDA programming guide 4.2). Table F-2 lists the maximum texture width, height, and depth depending on the compute capability of the device. Textures can also be layered as described in Section 3.2.10.1.5.

What is constant memory in CUDA?

The constant memory in CUDA is a dedicated memory space of 65536 bytes. It is dedicated because it has some special features like cache and broadcasting. The constant memory space resides in device memory and is cached in the constant cache mentioned in Compute Capability 1.


1 Answers

Textures can indeed be useful on devices of compute capability >= 2.0.

Textures and cudaArrays can use memory stored in a space filling curve, which can allow for a better cache hit rate due to better 2D spatial locality.

The texture cache is separate from the other caches. So it has its own dedicated memory and bandwidth and reading from it does not interfere with the other caches. This can become important if there is a lot of pressure on your L1/L2 caches.

Textures also provide built in functionality such as interpolation, various addressing modes (clamp, wrap, mirror) and normalized addressing with floating point coordinates. These can be used without any extra cost and can greatly improve performance in kernels where such functionality is needed.

On early CUDA architectures, textures and cudaArrays could not be written by a kernel. On architectures of compute capability >= 2.0, they can be written via CUDA surfaces.

Determining if you should use textures or a regular buffer in global memory comes down to the intended usage and access patterns for the memory. It will be project specific.

You are using the Fermi architecture, with a device that has been rebranded into the 6xx series.

For those on the Kepler architecture, take a look at NVIDIA's Inside Kepler Presentation. In particular, the slides, Texture Performance, Texture Cache Unlocked and const __restrict Example.

like image 180
Roger Dahl Avatar answered Sep 21 '22 20:09

Roger Dahl