Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

Question

In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb

5.2.3. Multiprocessor Level

...

8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as mentioned in Compute Capability 3.x.

Does this mean that the GPU Kepler CC3.0 processors are not only pipelined architecture, but also superscalar?

Pipelining - these two sequences execute in parallel (different operations at one time):
- LOAD [addr1] -> ADD -> STORE [addr1] -> NOP
- NOP -> LOAD [addr2] -> ADD -> STORE [addr2]
Superscalar - these two sequences execute in parallel (the same operations at one time):
- LOAD [reg1] -> ADD -> STORE [reg1]
- LOAD [reg2] -> ADD -> STORE [reg2]

Robert Crovella · Accepted Answer

Yes, the warp schedulers in Kepler can schedule two instructions per clock, as long as:

the instructions are independent
the instructions come from the same warp
there are sufficient execution resources in the SM for both instructions

If that fits your definition of superscalar, then it is superscalar.

With respect to pipelining, I view pipelining differently. Various execution units in Kepler SM are pipelined. Let's take a floating point multiply as an example.

In a given clock, a Kepler warp scheduler may schedule a floating point multiply operation on a floating-point unit. The results of this operation may not appear for some number of clocks later, (i.e. they are not available on the next clock cycle) but on the next clock cycle, a new floating point operation can be scheduled on the very same floating point functional units, because the hardware (floating point units, in this case) is pipelined.

clock    operation    pipeline stage   result
0           MPY1   ->   PS1
1                       PS2
...                     ...
N-1                     PSN         ->  result1

on the very next clock after clock 0, a new multiply instruction can be scheduled on the same HW, and the corresponding result will appear on the next cycle after result1 appears.

Not sure if this is what you meant by "different operations at one time"

Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

Tags:

cuda

gpgpu

gpu

nvidia

kepler

Alex

1 Answers

Robert Crovella

Recent Activity

Donate For Us

Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

Tags:

cuda

gpgpu

gpu

nvidia

kepler

Alex

1 Answers

Robert Crovella

Related questions

Recent Activity

Donate For Us