Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GLSL: scalar vs vector performance

All modern GPUs have scalar architecture, but shading languages offer a variety of vector and matrix types. I would like to know, how does scalarization or vectorization of GLSL source code affect performance. For example, let's define some "scalar" points:

float p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, p4x, p4y;
p0x = 0.0f; p0y = 0.0f;
p1x = 0.0f; p1y = 0.61f;
p2x = 0.9f; p2y = 0.4f;
p3x = 1.0f; p3y = 1.0f;

and their vector equivalents:

vec2 p0 = vec2(p0x, p0y);
vec2 p1 = vec2(p1x, p1y);
vec2 p2 = vec2(p2x, p2y);
vec2 p3 = vec2(p3x, p3y);

Having these points, which of the following mathematically equivalent pieces of code will run faster?

Scalar code:

position.x = -p0x*pow(t-1.0,3.0)+p3x*(t*t*t)+p1x*t*pow(t-1.0,2.0)*3.0-p2x*(t*t)*(t-1.0)*3.0;
position.y = -p0y*pow(t-1.0,3.0)+p3y*(t*t*t)+p1y*t*pow(t-1.0,2.0)*3.0-p2y*(t*t)*(t-1.0)*3.0;

or it's vector equivalent:

position.xy = -p0*pow(t-1.0,3.0)+p3*(t*t*t)+p1*t*pow(t-1.0,2.0)*3.0-p2*(t*t)*(t-1.0)*3.0;

?

Or will they run equivalently fast on modern GPUs?

The above code is only an example. Real-life examples of such "vectorizable" code may perform much heavier computations with much more input variables coming from global ins, uniforms and vertex attributes.

like image 520
Sergey Avatar asked Sep 14 '16 12:09

Sergey


1 Answers

The vectorised version is highly unlikely to be slower - in the worst case, it will probably just be replaced with the scalar version by the compiler anyway.

It may however be faster. Whether it will be faster largely depends on whether the code branches - if there are no branches, it is easier to feed the processing to multiple SIMD lanes than with code which branches. Compilers are pretty smart, and might be able to figure out that the scalar version can also be sent to multiple SIMD lanes ... but the compiler is more likely to be able to do its job to the best of its ability using the vectorised version. They're also smart enough to sometimes keep the SIMD lanes fed in the presence of limited branching, so even with branching code you are probably better off using the vectorised version.

like image 155
barneypitt Avatar answered Oct 20 '22 04:10

barneypitt