<h3>The Short Version</h3> In the following line: <pre class="prettyprint"><code>aData[i] = aData[i] + ( aOn * sin( i ) ); </code></pre> If <code>aOn</code> is <code>0</code> or <code>1</code>, does the processor actually perform the multiplication, or does it conditionally work out the result (<code>0</code> for <code>0</code>, other-value for <code>1</code>)? <h3>The Long Version</h3> I'm looking into algorithm performance consistency, which partly involves a look into the effect of Branch Prediction. The hypothesis is that this code: <pre class="prettyprint"><code>for ( i = 0; i < iNumSamples; i++ ) aData[i] = aData[i] + ( aOn * sin( i ) ); </code></pre> will provide more stable performance than this code (where branch prediction may destabilise performance): <pre class="prettyprint"><code>for ( i = 0; i < iNumSamples; i++ ) { if ( aOn ) aData[i] = aData[i] + sin( i ); } </code></pre> with <code>aOn</code> being either <code>0</code> or <code>1</code>, and it can toggle during the loop execution by another thread. The actual conditional calculation (<code>+ sin( i )</code> in the example above) involves more processing and the if condition must be within the loop (there are multitude of conditions, not just one like in the example above; also, changes to <code>aOn</code> should have effect immediately and not per loop). Ignoring performance consistency, the performance tradeoff between the two options is in the time it takes to execute the <code>if</code> statement and that of a multiplication. Regardless, it is easy to spot that if a processor would not perform the actual multiplication for values like <code>1</code> and <code>0</code>, the first option could be a win-win solution (no branch prediction, better performance).

Processors perform regular multiplication with <code>0</code>s and <code>1</code>s. Reason is, that if the processor would check for <code>0</code> and <code>1</code> before each calculation, the introduction of the condition will take more cycles. While you would gain performance for <code>0</code> and <code>1</code> multipliers, you will lose performance for any other values (which are much more probable). A simple program can prove this: <pre class="prettyprint"><code>#include <iostream> #include "cycle.h" #include "time.h" void Loop( float aCoefficient ) { float iSum = 0.0f; clock_t iStart, iEnd; iStart = clock(); for ( int i = 0; i < 100000000; i++ ) { iSum += aCoefficient * rand(); } iEnd = clock(); printf("Coefficient: %f: %li clock ticks\n", aCoefficient, iEnd - iStart ); } int main(int argc, const char * argv[]) { Loop( 0.0f ); Loop( 1.0f ); Loop( 0.25f ); return 0; } </code></pre> For which the output is: <pre class="prettyprint"><code>Coefficient: 0.000000: 1380620 clock ticks Coefficient: 1.000000: 1375345 clock ticks Coefficient: 0.250000: 1374483 clock ticks </code></pre>

Do processors actually calculate multiplication by a zero or one? Why?

The Short Version

In the following line:

aData[i] = aData[i] + ( aOn * sin( i ) );

If aOn is 0 or 1, does the processor actually perform the multiplication, or does it conditionally work out the result (0 for 0, other-value for 1)?

The Long Version

I'm looking into algorithm performance consistency, which partly involves a look into the effect of Branch Prediction.

The hypothesis is that this code:

for ( i = 0; i < iNumSamples; i++ )
    aData[i] = aData[i] + ( aOn * sin( i ) );

will provide more stable performance than this code (where branch prediction may destabilise performance):

for ( i = 0; i < iNumSamples; i++ )
{
    if ( aOn )
        aData[i] = aData[i] + sin( i );
}

with aOn being either 0 or 1, and it can toggle during the loop execution by another thread.

The actual conditional calculation (+ sin( i ) in the example above) involves more processing and the if condition must be within the loop (there are multitude of conditions, not just one like in the example above; also, changes to aOn should have effect immediately and not per loop).

Ignoring performance consistency, the performance tradeoff between the two options is in the time it takes to execute the if statement and that of a multiplication.

Regardless, it is easy to spot that if a processor would not perform the actual multiplication for values like 1 and 0, the first option could be a win-win solution (no branch prediction, better performance).

529

asked Jul 08 '13 22:07

Izhaki

1 Answers

Processors perform regular multiplication with 0s and 1s.

Reason is, that if the processor would check for 0 and 1 before each calculation, the introduction of the condition will take more cycles. While you would gain performance for 0 and 1 multipliers, you will lose performance for any other values (which are much more probable).

A simple program can prove this:

#include <iostream>
#include "cycle.h"
#include "time.h"

void Loop( float aCoefficient )
{
    float iSum = 0.0f;

    clock_t iStart, iEnd;

    iStart = clock();
    for ( int i = 0; i < 100000000; i++ )
    {
        iSum += aCoefficient * rand();
    }
    iEnd = clock();
    printf("Coefficient: %f: %li clock ticks\n", aCoefficient, iEnd - iStart );
}

int main(int argc, const char * argv[])
{
    Loop( 0.0f );
    Loop( 1.0f );
    Loop( 0.25f );

    return 0;
}

For which the output is:

Coefficient: 0.000000: 1380620 clock ticks
Coefficient: 1.000000: 1375345 clock ticks
Coefficient: 0.250000: 1374483 clock ticks

157

answered Oct 12 '22 04:10

Izhaki

Related questions
                            
                                Same Program code with same compiler leads to different binaries
                            
                                Why getline() throws 'std::ios_base::failure' when exception mask is not set to eofbit?
                            
                                Can I alias a member of a base class in a derived class?
                            
                                c++ custom output stream with indentation
                            
                                Mat to unsigned char*
                            
                                How can I make the contents of an #include-file a compile-time constant in a cpp-file?
                            
                                cmake obtain source list
                            
                                Including .pdb files with librarian in Visual Studio
                            
                                Windows Phone: Log to console
                            
                                Removing code duplication from const and non-const methods that return iterators
                            
                                How to make the default UI elements in Qt more beautiful?
                            
                                Cannot build Opencv project with cmake
                            
                                efficient way to handle 2d line segments
                            
                                Can you SWIG a boost::optional<>?
                            
                                How to parallelize a divide-and-conquer algorithm efficiently?
                            
                                How to receive messages using a message-only window in a console application?
                            
                                size_t parameter new operator
                            
                                Oprofile vs perf [closed]
                            
                                Using shared_ptr in C interfaces?
                            
                                Overloading * operator to work on both right and left

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do processors actually calculate multiplication by a zero or one? Why?

Tags:

c++

performance

c

algorithm

processors

The Short Version

The Long Version

Izhaki

People also ask

1 Answers

Izhaki

Recent Activity

Donate For Us