Texture memory in CUDA: Concept and simple example to demonstrate performance

Question

I am reading the NVIDIA white paper titled Particle Simulation with CUDA by Simon Green.

It describes the SDK particles example and the algorithms used.

While discussing performance of the code, the author says that global memory arrays of position and velocity of the particles are "bound" to textures.

Now I am very confused by the concept of texture memory. The NVIDIA CUDA programming guide goes through some really gory and difficult explanations without any examples.

Hence I have 2 questions:

Can someone give / refer me to a really simple (Texture memory for dummies) example of how texture is used and improves performance.
The CUDA programming guide 4.0 on page 40 on page says "A texture can be any region of linear memory or a CUDA array". Now if, ( as is said ), texture memory gives better performance than global memory why not "bind" the entire global memory to texture memory?

talonmies · Accepted Answer

The cuda SDK contains a straightforward example simpleTexture which demonstrates performing a trivial 2D coordinate transformation using a texture.
The first thing to keep in mind is that texture memory is global memory. The only difference is that textures are accessed through a dedicated read-only cache, and that the cache includes hardware filtering which can perform linear floating point interpolation as part of the read process. The cache, however, is different to a conventional cache, in that it is optimised for spatial locality (in the coordinate system of the texture) and not locality in memory. For some applications, this is ideal and will give a performance advantage both because of the cache and the free FLOPs you can get from the filtering hardware, but for others, it won't and textures can be slower because access involves a cache miss penalty in addition to the global memory read, and interpolation is not required.

So something like particle simulation can benefit from textures because calculations are generally performed in cells or control volumes where local interactions are considered, and neighbour particles need to access each others velocities and accelerations. A spatially local cache works better for this than a simple linear memory cache. But for other applications, there isn't intrinsic spatial locality in memory access patterns, and textures provide little or no benefit over conventional cached memory.

Texture memory in CUDA: Concept and simple example to demonstrate performance

Tags:

cuda

smilingbuddha

1 Answers

talonmies

Recent Activity

Donate For Us

Texture memory in CUDA: Concept and simple example to demonstrate performance

Tags:

cuda

smilingbuddha

1 Answers

talonmies

Related questions

Recent Activity

Donate For Us