I know that both is on off-chip DRAM and cached.
But which is faster in access speed? Or in what circumstances one is faster than the other?
In my application texture memory is a bit faster than global memory on Fermi , even though CUDA documentations recommends just the opposite, and it is much faster on GTX 280. So: the only way to know is to experiment! If you are doing anything that has a 2D/3D spatial component to it then I'd recommend texture memory.
In some situations, using constant memory instead of global memory may reduce the memory bandwidth (which is beneficial for kernels). Constant memory is also most effective when all threads access the same value at the same time (i.e. the array index is not a function of the position).
Texture memory is read-only device memory, and can be accessed using the device functions described in Texture Functions. Reading a texture using one of these functions is called a texture fetch. Texture memory traffic is routed through the texture cache (which is independent of the L1 data cache) and the L2 cache.
Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip. Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate.
Constant memory is optimized for broadcast, i.e. when the threads in a warp all read the same memory location. If they are reading different locations, it will work, but each different location referenced by a warp costs more time. When a read is being broadcast to the threads, constant memory is MUCH faster than texture memory.
Texture memory has high latency, even for cache hits. You can think of it as a bandwidth aggregator - if there's reuse that can be serviced out of the texture cache, the GPU does not have to go out to external memory for those reads. For 2D and 3D textures, the addressing has 2D and 3D locality, so cache line fills pull in 2D and 3D blocks of memory instead of rows.
Finally, the texture pipeline can perform "bonus" calculations: dealing with boundary conditions ("texture addressing") and converting 8- and 16-bit values to unitized float are examples of operations that can be done "for free." (they are part of the reason texture reads have high latency)
Texture memory is optimized for 2D spatial locality (where it gets its name from). You can kind of think of constant memory as taking advantage of temperal locality.
The benefits of texture memory over constant memory can be summarized as follows:
See the documentation for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With