Clarifying memory transactions in CUDA

Tags:

gpu

I am confused about the following statements in the CUDA programming guide 4.0 section 5.3.2.1 in the chapter of Performance Guidelines.

Global memory resides in device memory and device memory is accessed
via 32-, 64-, or 128-byte memory transactions. 

These memory transactions must be naturally aligned:Only the 32-, 64- , 
128- byte segments of device memory 
that are aligned to their size (i.e. whose first address is a 
multiple of their size) can be read or written by memory 
transactions.

1) My understanding of device memory was that accesses to the device memory by threads is uncached: So if thread accesses memory location a[i] it will fetch only a[i] and none of the values around a[i]. So the first statement seems to contradict this. Or perhaps I am misunderstanding the usage of the phrase "memory transaction" here?

2) The second sentence does not seem very clear. Can someone explain this?

896

asked Aug 10 '12 19:08

smilingbuddha

1 Answers

Memory transactions are performed per warp. So 32 byte transactions is a warp sized read of an 8 bit type, 64 byte transactions is a warp sized read of an 16 bit type, and 128 byte transactions is a warp sized read of an 32 bit type.
It just means that all reads have to be aligned to a natural word size boundary. It is not possible for a warp to read a 128 byte transaction with a one byte offset. See this answer for more details.

answered Sep 16 '22 20:09

talonmies

Related questions
                            
                                How to decrement each element of a device_vector by a constant?
                            
                                Poor performance for calculating eigenvalues and eigenvectors on GPU
                            
                                Should CUDA events and streams always be destroyed?
                            
                                Determinant calculation with CUDA [closed]
                            
                                cryptography hardware acceleration with GPU
                            
                                How to make vector-type-value to pinned memory in cuda
                            
                                Cuda virtual class
                            
                                How do I get a free version (non-trial) of the compiler "Cuda Fortran"? [closed]
                            
                                Passing the PTX program to the CUDA driver directly
                            
                                Redirecting CUDA printf to a C++ stream
                            
                                Numba Matrix Vector multiplication
                            
                                Cudafy cannot find cublas, cudafft
                            
                                ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory
                            
                                How do I know that cudaMemcpyAsync is done reading host memory?
                            
                                CUDA - Parallel Reduction Sum
                            
                                Optimizing execution of a CUDA kernel for Triangular Matrix calculation
                            
                                Allocate constant memory
                            
                                scaling factor for CUFFT
                            
                                CUBLAS matrix multiplication
                            
                                Minimum number of GPU threads to be effective

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clarifying memory transactions in CUDA

Tags:

cuda

gpu

smilingbuddha

People also ask

1 Answers

talonmies

Recent Activity

Donate For Us