I'm writing CG shaders for advanced lighting calculation for game based on Unity. Sometimes it is needed to sum all vector components. There are two ways to do it:
I am really curious about what is faster and looks better for code style.
It's obvious that if we have same question for CPU calculations, the first simle way is much better. Because of:
a) There is no need to allocate another float(1,1,1) vector
b) There is no need to multiply every original vector "v" components by 1.
But since we do it in shader code, which runs on GPU, I belive there is some great hardware optimization for dot product function, and may be allocation of float3(1,1,1) will be translated in no allocation at all.
float4 _someVector;
void surf (Input IN, inout SurfaceOutputStandard o){
float sum = _someVector.x + _someVector.y + _someVector.z + _someVector.w;
// VS
float sum2 = dot(_someVector, float4(1,1,1,1));
}
Check this link.
Vec3 Dot has a cost of 3 cycles, while Scalar Add has a cost of 1. Thus, in almost all platforms (AMD and NVIDIA):
float sum = v.x + v.y + v.z;
has a cost of 2
float sum = dot(v,float3(1,1,1));
has a cost of 3
The first implementation should be faster.
Implementation of the Dot product in cg: https://developer.download.nvidia.com/cg/dot.html
IMHO difference is immeasurable, in 98% of the cases, but first one should be faster, because multiplication is a "more expensive" operation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With