I have not yet used more complicated CBs like this here but, from what I understand, my C++ alignment and packing has to match what HLSL expects. So I'm trying to figure out the rules so I can predictably lay out the C++ struct
to match what HLSL
expects.
I was doing some tests in a Vertex Shader v5
to see packing produced in the output and used this structure in the vs.hlsl
:
cbuffer conbuf {
float m0;
float m1;
float4 m2;
bool m3[1];
bool m4[4];
float4 m5;
float m6;
float4 m7;
matrix m8;
float m9;
float m10;
float4 m11[2];
float m12[8];
float m13;
};
which produced the following output (in the Header File Name
VC++ Project HLSL Settings):
cbuffer conbuf {
float m0; // Offset: 0 Size: 4
float m1; // Offset: 4 Size: 4
float4 m2; // Offset: 16 Size: 16
bool m3; // Offset: 32 Size: 4
bool m4[4]; // Offset: 48 Size: 52
float4 m5; // Offset: 112 Size: 16
float m6; // Offset: 128 Size: 4
float4 m7; // Offset: 144 Size: 16
float4x4 m8; // Offset: 160 Size: 64
float m9; // Offset: 224 Size: 4
float m10; // Offset: 228 Size: 4
float4 m11[2]; // Offset: 240 Size: 32
float m12[8]; // Offset: 272 Size: 116
float m13; // Offset: 388 Size: 4
};
I pretty much figured out how offsets work (based on sizes) but I cannot understand the array sizes.
Some array sizes in here seem random. I can't figure out how the bool m4[4]
array has size: 52. Same for float m12[8]
which is size: 116. How does the HLSL compiler manage to produce these sizes?
Any help? I've already looked on MSDN packing page but they don't say much about arrays.
I'll simplify your example a little bit, since you already get padding.
One important part for arrays, as per Packing rules (link you mentioned) is :
Arrays are not packed in HLSL by default. To avoid forcing the shader to take on ALU overhead for offset computations, every element in an array is stored in a four-component vector.
So let's take this simple cbuffer:
cbuffer cbPerObj : register( b0 )
{
float Alpha[4];
};
As per the above rule (each float is stored in four vector), this would be (almost) equivalent to:
cbuffer cbPerObj : register( b0 )
{
float4 Alpha[4];
};
Or (expanded)
cbuffer cbPerObj : register( b0 )
{
float Alpha1;
float3 Dummy1;
float Alpha2;
float3 Dummy2;
float Alpha3;
float3 Dummy3;
float Alpha4;
};
As you can notice, your last element is not padded, this is why you can notice in your case:
bool m4[4]; // Offset: 48 Size: 52
float4 m5; // Offset: 112 Size: 16
m4 is 16*4 = 64 (minus the last 3), 64-12 = 52
You can also notice that of course, 48 + 52 = 100 (so since m5 needs not to cross boundary, you can find the 12 lost bytes for the offset)
In the case you had,
bool m4[4]; // Offset: 48 Size: 52
float m5;
Offset for m5 would be 100, since it can fit the boundary.
Hope that makes sense.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With