I am essentially trying to understand how GPUs work when it comes to converting floating vertex coordinates to fixed-point numbers during rasterization.
I read this excellent article which explains already a lot things, but it also confuses me. So the article explains that because we use 32 bits integers and the edge function which as the following form (a - b)*(c - d) - (e - f)*(g - h)
, we are limited to the range [-16384,16383]. I understand how we get to this number. Here are my questions:
- First, this suggests that vertex coordinates can be negative. However what I don't understand is that technically at that stage vertex coordinates are in raster space, and all triangles should have been clipped before. Thus technically there should only be vertex coordinates in the range [0, image width] for the x-coordinate and [0, image height] for the y-coordinate? So why are coordinates negative?
The short answer is that, while the triangles have been clipped, they haven't been clipped to the viewport (0,0 - image width,image height). Instead, they are clipped to the guard-band clipping region, which is a larger rectangle that surrounds the viewport. Vertex coordinates that are outside the viewport but within the guard-band clipping region can have negative coordinates.
There are (at least) three types of triangle clipping. The first is "analytic clipping", which is when you calculate the intersection of the triangle edges with the guard-band clip region edges if they overlap it, and then cut off the triangle at those points and subdivide the remainder of it into smaller triangles, each of which is now inside the clip region. The second type is when the triangle bounding box is clipped against the viewport to find the range of pixels to iterate over while rasterizing (note this doesn't change the triangle vertex coordinates). The third type is the per-pixel test described in the article where you are iterating across the screen and testing each pixel to see if it is inside the triangle.
On top of this, depending on the implementation, center of the screen may be defined as (0,0) internally for the purposes of clipping calculations, meaning that anything on the left side of the screen is going to have a negative x-coordinate.
- So the author explains the range is too limited [-16384,16383]. Indeed if you have a 2048 pixels in width and use 256 sub-pixels then the coordinate of the point in x would need to be 4194048. Thus you would get overflow. The author keeps going and explains how they do it on the GPU to work around this problem, but I simply don't get it. If someone could also explain how it's practically done on the GPU then it would be great.
Note: I'm not a GPU engineer and so this is only a high-level conceptual answer:
The key phrase in the explanation given in the article is incremental evaluation. Take a look at the orient2d
equation:
int orient2d(const Point2D& a, const Point2D& b, const Point2D& c)
{
return (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x);
}
Points a
and b
are triangle vertices, whereas point c
is the screen co-ordinate. For a given triangle, the triangle vertices are going to remain the same while you iterate over the range of screen co-ordinates, only point c
changes. Incremental evaluation means you just calculate what has changed from the previous time you evaluated the equation.
Suppose we evaluate the equation one time and get a result w0
:
w0 = (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x);
Then c.x
gets incremented by an amount s
(the per pixel step). The new value of w0
is going to be:
w0_new = (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x+s-a.x);
Subtracting the first equation from the second, we get:
w0_new - w0 = -(b.y-a.y)*s;
-(b.y-a.y)*s
is a constant value for a given triangle, because s
is the same amount each time (one pixel), and a
and b
as already mentioned are constant too. We can calculate it once and store it in a variable (call it w0_step
) and then the calculation reduces to:
w0_new = w0 + w0step;
You can do this for w1
and w2
, and also do a similar thing for the c.y
step. The reason that this allows more precision is that the per pixel equation no longer contains a fixed-point multiply, which is what causes the overflow. The GPU can do a high-precision calculation once per triangle (e.g. in 64 bits) and then do a lower precision one per pixel (e.g. 32 bits).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With