CUDA: bank conflicts between different warps?

Tags:

I just learned (from Why only one of the warps is executed by a SM in cuda?) that Kepler GPUs can actually execute instructions from several (apparently 4) warps at once.

Can a shared memory bank also serve four requests at once? If not, that would mean that bank conflicts can occur between threads of different warps that happen to be executed concurrently, even though there are no bank conflicts within any of the individual warps, right? Is there any information on this?

713

asked Feb 15 '14 19:02

user3314215

1 Answers

Compute capability 3.x devices (Kepler) have 4 warps schedulers per SM. On each cycle each warp scheduler selects a warp and issues 1-2 instructions from the warp. The SM only has one load store unit (LSU) unit that services L1 and shared memory requests so only 1 of the 8 potential instructions can be dispatched to the LSU so bank conflicts between warps will not occur.

132

answered Sep 30 '22 13:09

Greg Smith

Related questions
                            
                                What are the "long" and "short" scoreboards w.r.t. MIO/L1TEX?
                            
                                Shared Memory Bank Conflicts in CUDA: How memory is aligned to banks
                            
                                CUDA - what is this loop doing
                            
                                shared memory optimization confusion
                            
                                PyCUDA: Pow within device code tries to use std::pow, fails
                            
                                The peak throughput of cuda Kernel on NVIDA GPU
                            
                                Strange error while using cudaMemcpy: cudaErrorLaunchFailure
                            
                                cuda understanding concurrent kernel execution
                            
                                Pitch alignment for 2D textures
                            
                                Compiling Eigen library with nvcc (CUDA)
                            
                                CUDA result returns garbage using very large array, but reports no error
                            
                                Are cudaMalloc and cudaFree synchronous or asynchronous calls?
                            
                                Reducing matrix rows or columns in CUDA
                            
                                Let nvidia K20c use old stream management way?
                            
                                Surface reference faster than Surface object
                            
                                CUDA - how much slower is transferring over PCI-E?
                            
                                Performance of atomic operations on shared memory
                            
                                Double-templated function instantiation fails
                            
                                Mixing C++ flavours in the same project
                            
                                External calls are not supported - CUDA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA: bank conflicts between different warps?

Tags:

cuda

shared-memory

bank-conflict

user3314215

People also ask

1 Answers

Greg Smith

Recent Activity

Donate For Us