Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does modern GPU (e.g Fermi/Evergreen) supports out of order execution?

I am writing an OpenCL kernel which involves a few barriers in a loop. I have tested the kernel on CPU (8-core FX8150) and the result shows these barriers reduced running speed by a factor of 50~100 times (I further verified this by re-implementing the kernel on Java using multi-threading + CyclicBarrier). I suspect the reason was barrier essentially stops the CPU taking advantage of out-of-order execution, so I am a little worried if I would observe the same magnitude of speed decrease on GPU. I checked a few official documents and googled around a bit but there is little information available on this topic.

like image 491
aaronqli Avatar asked Sep 08 '12 14:09

aaronqli


2 Answers

Current state-of-the art GPUs are in-order pipelined processor. GPUs fill the pipeline effectively by interleaving instructions from different warps (wavefronts). In comparisons, CPUs use out-of-order speculative execution to fill the pipeline. There are different functional units like ALUs and SFUs which have separated pipelines. But notice that instruction dependency stalls the warp. For more information on instruction dependency resolving on GPUs refer to this NVIDIA patent.

like image 106
lashgar Avatar answered Oct 02 '22 15:10

lashgar


NVIDIA’s Next Generation
CUDA Compute and Graphics Architecture, Code-Named “Fermi”:

Nvidia GigaThread Engine has capabilities of(at page 5)

  • 10x faster application context switching
  • Concurrent kernel execution
  • Out of Order thread block execution :)
  • Dual overlapped memory transfer engines

Evergreen has SIMD capabilities and has a chance outperform some fermi but i dont know about oooe of it. There is also "local atomic add" upper hand of HD 7000 series compared to GTX 600 series (nearly 10x faster)

like image 38
huseyin tugrul buyukisik Avatar answered Oct 02 '22 16:10

huseyin tugrul buyukisik