CUDA programming - L1 and L2 caches

Tags:

Could you please explain the differences between using both "L1 and L2" caches or "only L2" cache in CUDA programming? What should I expect in time execution? When could I expect smaller gpu time? When I enable both L1 and L2 caches or just enable L2? thanks

488

asked Apr 16 '12 20:04

Saman I

1 Answers

Typically you would leave both L1 and L2 caches enabled. You should try to coalesce your memory accesses as much as possible, i.e. threads within a warp should access data within the same 128B segment as much as possible (see the CUDA Programming Guide for more info on this topic).

Some programs are unable to be optimised in this manner, their memory accesses are completely random for example. For those cases it may be beneficial to bypass the L1 cache, thereby avoiding loading an entire 128B line when you only want, for example, 4 bytes (you'll still load 32B since that is the minimum). Clearly there is an efficiency gain: 4 useful bytes from 128 is improved to 4 from 32.

answered Sep 21 '22 06:09

Tom

Related questions
                            
                                different kernels for different architectures
                            
                                How to read back a CUDA Texture for testing?
                            
                                How to stop Matlab crashing on (wrong) mex-file execution with CUDA functionality
                            
                                Counting occurrences of numbers in a CUDA array
                            
                                PCI-e lane allocation on 2-GPU cards?
                            
                                cudaDeviceSynchronize() error code 77: cudaErrorIllegalAddress
                            
                                Why use SIMD if we have GPGPU? [closed]
                            
                                How does CUDA Thrust compare to a raw kernel?
                            
                                memory allocation inside a CUDA kernel
                            
                                Does CUDA applications' compute capability automatically upgrade?
                            
                                OpenCV 2.4.3rc and CUDA 4.2: "OpenCV Error: No GPU support"
                            
                                Copying data to "cufftComplex" data struct?
                            
                                How to normalize matrix columns in CUDA with max performance?
                            
                                What are "Other" Issue Stall Reasons displayed by the Nsight profiler?
                            
                                Is there a CUDA smart pointer?
                            
                                "register" keyword in CUDA
                            
                                CUDA: how to sum all elements of an array into one number within the GPU?
                            
                                Install CUDA 8 and CUDA 9 in windows
                            
                                <<< >>> cuda in vscode
                            
                                CUDA Matrix multiplication breaks for large matrices

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CUDA programming - L1 and L2 caches

Tags:

cuda

coalescing

Saman I

People also ask

1 Answers

Tom

Recent Activity

Donate For Us