Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much of a modern graphics pipeline uses dedicated hardware?

To put the question another way, if one were to try and reimplement OpenGL or DirectX (or an analogue) using GPGPU (CUDA, OpenCL), where and why would it be slower that the stock implementations on NVIDIA and AMD cards?

I can see how vertex/fragment/geometry/tesselation shaders could be made nice and fast using GPGPU, but what about things like generating the list of fragments to be rendered, clipping, texture sampling and so on?

I'm asking purely for academic interest.

like image 691
DaedalusFall Avatar asked Oct 30 '11 13:10

DaedalusFall


2 Answers

Modern GPUs have still lots of fixed-function hardware which is hidden from the compute APIS. This includes: The blending stages, the triangle rasterization and a lot of on-chip queues. The shaders of course all map well to CUDA/OpenCL -- after all, shaders and the compute languages all use the same part of the GPU -- the general purpose shader cores. Think of those units as a bunch of very-wide SIMD CPUs (for instance, a GTX 580 has 16 cores with a 32 wide SIMD unit.)

You get access to the texture units via shaders though, so there's no need to implement that in "compute". If you would, your performance would suck most likely as you don't get access to the texture caches which are optimized for spatial layout.

You shouldn't underestimate the amount of work required for rasterization. This is a major problem, and if you throw all of the GPU at it you get roughly 25% of the raster hardware performance (see: High-Performance Software Rasterization on GPUs.) That includes the blending costs, which are also done by fixed-function units usually.

Tesselation has also a fixed-function part which is difficult to emulate efficiently, as it amplifies the input up to 1:4096, and you surely don't want to reserve so much memory up-front.

Next, you get lots of performance penalties because you don't have access to framebuffer compression, as there is again dedicated hardware for this which is "hidden" from you when you're in compute only mode. Finally, as you don't have any on-chip queues, it will be difficult to reach the same utility ratio as the "graphics pipeline" gets (for instance, it can easily buffer output from vertex shaders depending on shader load, you can't switch shaders that flexibly.)

like image 121
Anteru Avatar answered Nov 15 '22 08:11

Anteru


an interesting source code link : http://code.google.com/p/cudaraster/

and corresponding research paper: http://research.nvidia.com/sites/default/files/publications/laine2011hpg_paper.pdf

Some researchers at Nvidia have tried to implement and benchmark exactly what was asked in this post : "Open-source implementation of "High-Performance Software Rasterization on GPUs"" ...

And it is open source for "purely academic interest" : it is a limited sub-set of Opengl, mainly for benchmarking rasterization of triangles.

like image 39
Arnaud Nauwynck Avatar answered Nov 15 '22 09:11

Arnaud Nauwynck