Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster in CUDA: Constant Memory or Texture Memory?

Tags:

cuda

I know that both is on off-chip DRAM and cached.

But which is faster in access speed? Or in what circumstances one is faster than the other?

like image 429
lei_z Avatar asked Jul 14 '12 02:07

lei_z


People also ask

Is texture memory faster than global memory?

In my application texture memory is a bit faster than global memory on Fermi , even though CUDA documentations recommends just the opposite, and it is much faster on GTX 280. So: the only way to know is to experiment! If you are doing anything that has a 2D/3D spatial component to it then I'd recommend texture memory.

When to use constant memory in CUDA?

In some situations, using constant memory instead of global memory may reduce the memory bandwidth (which is beneficial for kernels). Constant memory is also most effective when all threads access the same value at the same time (i.e. the array index is not a function of the position).

What is texture memory in CUDA?

Texture memory is read-only device memory, and can be accessed using the device functions described in Texture Functions. Reading a texture using one of these functions is called a texture fetch. Texture memory traffic is routed through the texture cache (which is independent of the L1 data cache) and the L2 cache.

Why is shared memory faster CUDA?

Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip. Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate.


2 Answers

Constant memory is optimized for broadcast, i.e. when the threads in a warp all read the same memory location. If they are reading different locations, it will work, but each different location referenced by a warp costs more time. When a read is being broadcast to the threads, constant memory is MUCH faster than texture memory.

Texture memory has high latency, even for cache hits. You can think of it as a bandwidth aggregator - if there's reuse that can be serviced out of the texture cache, the GPU does not have to go out to external memory for those reads. For 2D and 3D textures, the addressing has 2D and 3D locality, so cache line fills pull in 2D and 3D blocks of memory instead of rows.

Finally, the texture pipeline can perform "bonus" calculations: dealing with boundary conditions ("texture addressing") and converting 8- and 16-bit values to unitized float are examples of operations that can be done "for free." (they are part of the reason texture reads have high latency)

like image 53
ArchaeaSoftware Avatar answered Jan 04 '23 05:01

ArchaeaSoftware


Texture memory is optimized for 2D spatial locality (where it gets its name from). You can kind of think of constant memory as taking advantage of temperal locality.

The benefits of texture memory over constant memory can be summarized as follows:

  • Spatial locality
  • The addressing calculations can be calculated outside of the kernel in the hardware
  • Data can be accessed by different variables in a single operation
  • 8 bit and 16 bit data can be automatically converted to floating point numbers between 0 and 1.0

See the documentation for more details.

like image 38
tskuzzy Avatar answered Jan 04 '23 05:01

tskuzzy