Now, when reading through different resources in the Internet, a structures of arrays seems to be a very performant way to storage your data, if you are processing a large arrays sequentially.
For example in C++
struct CoordFrames
{
float* x_pos;
float* y_pos;
float* z_pos;
float* scaleFactor;
float* x_quat;
float* y_quat;
float* z_quat;
float* w_quat;
};
allowing faster processing of a large array (thanks to SIMD) than an array of
struct CoordFrame
{
glm::vec3 position;
float scaleFactor;
glm::quat quaternion;
};
GPUs are processors designed for massive parallel computing. SIMD is a "must have" here. So the conclusion would be that structures of arrays would be most useful here.
But ...
I have never seen a GLSL shader like this anywhere (and it just feels wrong for me):
#define NUM_POINT_LIGHTS 16
uniform float point_light_x[NUM_POINT_LIGHTS];
uniform float point_light_y[NUM_POINT_LIGHTS];
uniform float point_light_z[NUM_POINT_LIGHTS];
uniform float point_light_radius[NUM_POINT_LIGHTS];
uniform float point_light_color_r[NUM_POINT_LIGHTS];
uniform float point_light_color_g[NUM_POINT_LIGHTS];
uniform float point_light_color_b[NUM_POINT_LIGHTS];
uniform float point_light_power[NUM_POINT_LIGHTS];
or something like is also not seen very often:
#define NUM_POINT_LIGHTS 16
uniform vec3 point_light_pos[NUM_POINT_LIGHTS];
uniform float point_light_radius[NUM_POINT_LIGHTS];
uniform vec3 point_light_color[NUM_POINT_LIGHTS];
uniform float point_light_power[NUM_POINT_LIGHTS];
Everyone, including me, seems to be prefering to write GLSL more like this:
#define NUM_POINT_LIGHTS 16
struct PointLight
{
vec3 origin;
float radius;
vec3 color;
float power;
};
uniform PointLight pointLights[NUM_POINT_LIGHTS];
Also, when reading on the original OpenGl Wiki about Vertex Array Data, I wondered, that suddenly, interleaved data should be preferred:
As a general rule, you should use interleaved attributes wherever possible.
What's true? Are GPUs highly optimized for the way we love to write shaders, that it doesn't really make any difference?
I do not think it would help in general although I currently have no hard numbers.
Many modern GPU's indeed use SoA format. However the array part is often the multiple invocations of the shader, and when looking at a single invocation it is as if you execute without SIMD. Therefore, especially with uniform variables, SoA layout of the variables has no significant performance difference.
Some other GPU's actually have AoS layout. For example Intel Sandy Bridge (Core 2011 edition) executes 2 vertex shaders at the same time on a core, but has an 8 wide SIMD unit, with essentially a layout of 2 vec4's. Working with vectors therefore can make it easier for the compiler to optimize your code.
If we look at the benefits of SoA on the CPU two major benefits are
The better cache utilization is basically the same for the GPU. However often you optimize your datastructures for the single draw operation anyway, so there are no members that you leave out to improve cache utilization. Although it would probably still be wasteful to include an array of materials as AoS when rendering a shadowmap for example.
Using SIMD instructions is much less of a problem as from the perspective of a single shader invocation you are not really using SIMD and therefore no restrictions on your loads and stores. Depending on the architecture there may be some instructions that load multiple elements, but for example with the AMD GCN architecture, you can use the individually loaded variables afterwards and can therefore just load an entire struct and use it.
I would guess that if you are computation limited it does not really matter and if you are bandwidth limited you should decrease the size of the loaded data, where you could possibly use an SoA layout to reach that goal.
If it is just the array of 16 lights I would not worry though as it is pretty small and will probably not really use significant bandwidth.
As for the interleaved attributes, this is probably very GPU dependent. For example with Sandy Bridge, with 2 vertex shader invocations, you have much better locality of those two vertices by interleaving them.
However, on AMD GCN where a single core can execute 64 shaders at the same time, you are probably going to get good locality even if you do not interleave your attributes,as each attribute should fill entire cache lines (assuming the vertices are close if you do indexed rendering).
Just remember that performance characteristics can vary between GPU's, drivers and what you are trying to do. Nothin beats a good benchmark for the specific problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With