In the following line:
aData[i] = aData[i] + ( aOn * sin( i ) );
If aOn
is 0
or 1
, does the processor actually perform the multiplication, or does it conditionally work out the result (0
for 0
, other-value for 1
)?
I'm looking into algorithm performance consistency, which partly involves a look into the effect of Branch Prediction.
The hypothesis is that this code:
for ( i = 0; i < iNumSamples; i++ )
aData[i] = aData[i] + ( aOn * sin( i ) );
will provide more stable performance than this code (where branch prediction may destabilise performance):
for ( i = 0; i < iNumSamples; i++ )
{
if ( aOn )
aData[i] = aData[i] + sin( i );
}
with aOn
being either 0
or 1
, and it can toggle during the loop execution by another thread.
The actual conditional calculation (+ sin( i )
in the example above) involves more processing and the if condition must be within the loop (there are multitude of conditions, not just one like in the example above; also, changes to aOn
should have effect immediately and not per loop).
Ignoring performance consistency, the performance tradeoff between the two options is in the time it takes to execute the if
statement and that of a multiplication.
Regardless, it is easy to spot that if a processor would not perform the actual multiplication for values like 1
and 0
, the first option could be a win-win solution (no branch prediction, better performance).
Binary long multiplication This method is mathematically correct and has the advantage that a small CPU may perform the multiplication by using the shift and add features of its arithmetic logic unit rather than a specialized circuit. The method is slow, however, as it involves many intermediate additions.
For most modern processors the standard integer operations take just one clock cycle, excepting just multiplication and division (if available). Multiplication typically takes around 6 cycles, and division often 30 to 60 cycles.
High-performance code often favors multiplication over division because multiplication is faster on most modern processors.
Processors perform regular multiplication with 0
s and 1
s.
Reason is, that if the processor would check for 0
and 1
before each calculation, the introduction of the condition will take more cycles. While you would gain performance for 0
and 1
multipliers, you will lose performance for any other values (which are much more probable).
A simple program can prove this:
#include <iostream>
#include "cycle.h"
#include "time.h"
void Loop( float aCoefficient )
{
float iSum = 0.0f;
clock_t iStart, iEnd;
iStart = clock();
for ( int i = 0; i < 100000000; i++ )
{
iSum += aCoefficient * rand();
}
iEnd = clock();
printf("Coefficient: %f: %li clock ticks\n", aCoefficient, iEnd - iStart );
}
int main(int argc, const char * argv[])
{
Loop( 0.0f );
Loop( 1.0f );
Loop( 0.25f );
return 0;
}
For which the output is:
Coefficient: 0.000000: 1380620 clock ticks
Coefficient: 1.000000: 1375345 clock ticks
Coefficient: 0.250000: 1374483 clock ticks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With