I know that both is on off-chip DRAM and cached. But which is faster in access speed? Or in what circumstances one is faster than the other?

Texture memory is optimized for 2D spatial locality (where it gets its name from). You can kind of think of constant memory as taking advantage of temperal locality. The benefits of texture memory over constant memory can be summarized as follows: <ul> <li>Spatial locality</li> <li>The addressing calculations can be calculated outside of the kernel in the hardware</li> <li>Data can be accessed by different variables in a single operation</li> <li>8 bit and 16 bit data can be automatically converted to floating point numbers between 0 and 1.0</li> </ul> See the documentation for more details.

Which is faster in CUDA: Constant Memory or Texture Memory?

2 Answers

Constant memory is optimized for broadcast, i.e. when the threads in a warp all read the same memory location. If they are reading different locations, it will work, but each different location referenced by a warp costs more time. When a read is being broadcast to the threads, constant memory is MUCH faster than texture memory.

Texture memory has high latency, even for cache hits. You can think of it as a bandwidth aggregator - if there's reuse that can be serviced out of the texture cache, the GPU does not have to go out to external memory for those reads. For 2D and 3D textures, the addressing has 2D and 3D locality, so cache line fills pull in 2D and 3D blocks of memory instead of rows.

Finally, the texture pipeline can perform "bonus" calculations: dealing with boundary conditions ("texture addressing") and converting 8- and 16-bit values to unitized float are examples of operations that can be done "for free." (they are part of the reason texture reads have high latency)

answered Jan 04 '23 05:01

ArchaeaSoftware

Texture memory is optimized for 2D spatial locality (where it gets its name from). You can kind of think of constant memory as taking advantage of temperal locality.

The benefits of texture memory over constant memory can be summarized as follows:

Spatial locality
The addressing calculations can be calculated outside of the kernel in the hardware
Data can be accessed by different variables in a single operation
8 bit and 16 bit data can be automatically converted to floating point numbers between 0 and 1.0

See the documentation for more details.

answered Jan 04 '23 05:01

tskuzzy

Related questions
                            
                                CMake CUDA separate compilation static lib link error on Windows but not on Ubuntu
                            
                                Expected number of bank conflicts in shared memory at random access
                            
                                how to link library (e.g. CUBLAS, CUSPARSE) for CUDA on windows
                            
                                Is it worthwhile to pass kernel parameters via shared memory?
                            
                                nvcc.exe linking error Microsoft Visual Studio configuration file 'vcvars64.bat' could not found
                            
                                using thrust::sort inside a thread
                            
                                Should I look into PTX to optimize my kernel? If so, how?
                            
                                Constant memory usage in CUDA code
                            
                                how to keep kernel code inside separate .cu file other than the main .cpp?
                            
                                Parallel implementation for multiple SVDs using CUDA
                            
                                What is the difference between __ldg() intrinsic and a normal execution?
                            
                                How to check if cuda is installed correctly on Anaconda
                            
                                Sorting 3 arrays by key in CUDA (using Thrust perhaps)
                            
                                check global device memory using cuda-gdb
                            
                                How to observe CUDA events and metrics for a subsection of an executable (e.g. only during a kernel execution time)?
                            
                                Check whether the code is running on the GPU or CPU
                            
                                Dealing with Boundary conditions / Halo regions in CUDA
                            
                                Is restrict(amp) more restrictive than CUDA kernel code?
                            
                                cuda inline and noinline device functions
                            
                                Does CUDA automatically load-balance for you?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which is faster in CUDA: Constant Memory or Texture Memory?

Tags:

cuda

lei_z

People also ask

2 Answers

ArchaeaSoftware

tskuzzy

Recent Activity

Donate For Us