Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Instruction Level Parallelism (ILP) and out-of-order execution on NVIDIA GPUs

Tags:

cuda

nvidia

Do NVIDIA GPUs support out-of-order execution?

My first guess is that they don't contain such expensive hardware. However, when reading the CUDA progamming guide, the guide recommends using Instruction Level Parallelism (ILP) to improve performance.

Isn't ILP a feature that hardware supporting out-of-order execution can take advantage from? Or NVIDIA's ILP simply means compiler-level re-ordering of instructions, hence its order is still fixed at runtime. In other words, just the compiler and/or programmer has to arrange the order of instructions in such a way that ILP can be achieved at runtime through in-order executions?

like image 900
user2188453 Avatar asked Jul 26 '13 12:07

user2188453


People also ask

What is ILP GPU?

Or NVIDIA's ILP simply means compiler-level re-ordering of instructions, hence its order is still fixed at runtime. In other words, just the compiler and/or programmer has to arrange the order of instructions in such a way that ILP can be achieved at runtime through in-order executions? cuda. nvidia.

What is instruction-level parallelism with example?

A typical ILP allows multiple-cycle operations to be pipelined. Example : Suppose, 4 operations can be carried out in single clock cycle. So there will be 4 functional units, each attached to one of the operations, branch unit, and common register file in the ILP execution hardware.

What is the difference between ILP and true parallelism?

In ILP there is a single specific thread of execution of a process. On the other hand, concurrency involves the assignment of multiple threads to a CPU's core in a strict alternation, or in true parallelism if there are enough CPU cores, ideally one core for each runnable thread.

Can pipelining be used in an instruction-level parallelism setup?

parallelism or pipelining is used. The basic idea in using pipelining is to make use of multiple instructions in one clock cycle. This is possible only if there are no dependencies between the two instructions.


1 Answers

Pipelining is a common ILP technique and is for sure implemented on NVidia's GPU. I guess you agree that pipelining doesn't rely on out-of-order execution. Besides, NVidia GPU have multiple warp schedulers from compute capability 2.0 and beyond (2 or 4). If your code has 2 (or more) consecutive and independent instructions in threads (or compiler reorders it that way somehow), you exploit this ILP from scheduler as well.

Here is a well explained question on how 2-wide warp scheduler + pipelining work together. How do nVIDIA CC 2.1 GPU warp schedulers issue 2 instructions at a time for a warp?

Also checkout Vasily Volkov's presentation on GTC 2010. He experimentally found out how ILP would improve CUDA code performance. http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf

In terms of out-of-order execution on GPU, I don't think so. Hardware instruction reordering, speculative execution all those kind of stuff are too expensive to implement per SM, as you are aware. And thread level parallelism can fill in the gap of lacking out-of-order execution. When true dependency is encountered, some other warps can kick in and fill the pipe.

like image 78
Superspr Avatar answered Sep 21 '22 14:09

Superspr