Are GPU/CUDA cores SIMD ones?

Tags:

Let's take the nVidia Fermi Compute Architecture. It says:

The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 CUDA cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The 512 CUDA cores are organized in 16 SMs of 32 cores each.

[...]

Each CUDA processor has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU).

[...]

In Fermi, the newly designed integer ALU supports full 32-bit precision for all instructions, consistent with standard programming language requirements. The integer ALU is also optimized to efficiently support 64-bit and extended precision operations. V

From what I know, and what is unclear for me, is that GPUs execute the threads in so called warps, each warp consists of ~32 threads. Each warp is assigned to only one core (is that true?). So does that mean, that each of the 32 cores of a single SM is a SIMD processor, where a single instruction handles 32 data portions ? If so, then why we say there are 32 threads in a warp, not a single SIMD thread? Why cores are sometimes referred to as scalar processors, not vector processors ?

817

asked Feb 02 '15 18:02

Marc Andreson

2 Answers

Each warp is assigned to only one core (is that true?).

No, it's not true. A warp is a logical assembly of 32 threads of execution. To execute a single instruction from a single warp, the warp scheduler must usually schedule 32 execution units (or "cores", although the definition of a "core" is somewhat loose).

Cores are in fact scalar processors, not vector processors. 32 cores (or execution units) are marshalled by the warp scheduler to execute a single instruction, across 32 threads, which is where the "SIMT" moniker comes from.

125

answered Sep 28 '22 20:09

Robert Crovella

CUDA "cores" can be thought of as SIMD lanes.

First let's recall that the term "CUDA core" is nVIDIA marketing-speak. These are not cores the same way a CPU has cores. Similarly, "CUDA threads" are not the same as the threads we know on CPUs.

The equivalent of a CPU core on a GPU is a "symmetric multiprocessor": It has its own instruction scheduler/dispatcher, its own L1 cache, its own shared memory etc. It is CUDA thread blocks rather than warps that are assigned to a GPU core, i.e. to a streaming multiprocessor. Within an SM, warps get selected to have instructions scheduled, for the entire warp. From a CUDA perspective, those are 32 separate threads, which are instruction-locked; but that's really no different than saying that a warp is like a single thread, which only executes 32-lane-wide SIMD instructions. Of course this isn't a perfect analogy, but I feel it's pretty sound. Something you don't quite / don't always have on CPU SIMD lanes is a masking of which lanes are actively executing, where inactive lanes will have not have the effect of active lanes' setting of register values, memory writes etc.

I hope this helps you makes intuitive sense of things.

answered Sep 28 '22 22:09

einpoklum

Related questions
                            
                                Limiting register usage in CUDA: __launch_bounds__ vs maxrregcount
                            
                                Can nvidia-docker be run without a GPU?
                            
                                Trying to 'Make' CUDA SDK, ld cannot find library, ldconfig says it can
                            
                                Compiling Cuda code in Qt Creator on Windows
                            
                                CMake: how to add cuda to existing project
                            
                                Simple CUDA Test always fails with "an illegal memory access was encountered" error
                            
                                What's the difference between PTX and CUBIN w.r.t. the NVCC compiler?
                            
                                Effect of using page-able memory for asynchronous memory copy?
                            
                                CUDA - How to work with complex numbers?
                            
                                How to get card specs programmatically in CUDA
                            
                                cudaMemcpy segmentation fault
                            
                                Way to get floating-point special values in CUDA?
                            
                                Building GPL C program with CUDA module
                            
                                Bilinear interpolation in C/C++ and CUDA
                            
                                nvcc fatal : Value 'sm_20' is not defined for option 'gpu-architecture'
                            
                                device function pointers
                            
                                Error in a simple cuda compilation
                            
                                CUDA external class linkage and unresolved extern function in ptxas file
                            
                                Trying to get CUDA working, sample can't find helper_cuda.h
                            
                                CUDA linking error - Visual Express 2008 - nvcc fatal due to (null) configuration file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are GPU/CUDA cores SIMD ones?

Tags:

cuda

gpgpu

gpu

simd