Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to understand the tricky speed up

Sorry for may be too abstract question, but for me it is quite practical + may be some experts had similar experience and can explain it.

I have a big code, about 10000 lines size.

I notices that if in a certain place I put

if ( expression ) continue;

where expression is always false (double checked with logic of code and cout), but depends on unknown parameters (so compiler can't simply rid of this line during compilation) the speed of the program is increased by 25% (the result of calculation are the same). If I measure speed of the loop itself the speed up factor is bigger than 3.

Why can this happen and what is possible ways to use this speed up possibility without such tricks?

P.S. I use gcc 4.7.3, -O3 optimisation.


More info:

  1. I have tried two different expressions, both works.

  2. If I change the line to:

    if ( expression ) { cout << " HELLO " << endl; continue; };
    

    the speed up is gone.

  3. If I change the line to:

    expression;
    

    the speed up is gone.

  4. The code, which surrounds the line looks like this:

    for ( int i = a; ;  ) {
      do {
        i += d;
        if ( d*i > d*ilast ) break;
    
          // small amount of calculations, and conditional calls of continue;
    
      } while ( expression0 );
      if ( d*i > dir*ilast ) break;
    
      if ( expression ) continue;
    
       // very big amount calculations, and conditional calls of continue;
    
    }
    

    the for loop looks strange. It is because I have modified the loops in order to catch this bottle neck. Initially expression was equal to expression0 and instead of do-loop I had only this continue.

  5. I tried use __builtin_expect in order to understand branch prediction. With

      // the expression (= false) is supposed to be true by branch prediction.
    if ( __builtin_expect( !!(expression), 1) ) continue; 
    

    the speed up is 25%.

      // the expression (= false) is supposed to be false by branch prediction.
    if ( __builtin_expect( !!(expression), 0) ) continue; 
    

    the speed up is gone.

  6. If I use -O2 instead of -O3 the effect is gone. The code is slightly (~3%) slower than the fast O3-version with the false condition.

  7. Same for "-O2 -finline-functions -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize". With one more option: "-O2 -finline-functions -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone" the effect is amplified. With "the line" the speed is same, without "the line" the code is 75% slower.

  8. The reason is in just following conditional operator. So the code looks like this:

    for ( int i = a; ;  ) {
    
          // small amount of calculations, and conditional calls of continue;
    
      if ( expression ) continue;
    
        // calculations1
    
      if ( expression2 ) {
        // calculations2
      }
    
       // very big amount calculations, and conditional calls of continue;
    
    }
    

    The value of expression2 is almost always false. So I changed it like this:

    for ( int i = a; ;  ) {
    
          // small amount of calculations, and conditional calls of continue;
    
      // if ( expression ) continue; // don't need this anymore
    
        // calculations1
    
      if ( __builtin_expect( !!(expression2), 0 ) ) { // suppose expression2 == false
        // calculations2
      }
    
       // very big amount calculations, and conditional calls of continue;
    
    }
    

    And have got desired 25% speed up. Even a little bit more. And behaviour no longer depends on the critical line.

If somebody knows materials, which can explain this behaviour without guesses I will be very glad to read and accept their answer.

like image 529
klm123 Avatar asked Oct 28 '13 16:10

klm123


1 Answers

Found it.

The reason was in the just following conditional operator. So the code looks like this:

for ( int i = a; ;  ) {

      // small amount of calculations, and conditional calls of continue;

  if ( expression ) continue;

    // calculations1

  if ( expression2 ) {
    // calculations2
  }

   // very big amount calculations, and conditional calls of continue;

}

The value of expression2 is almost always false. So I changed it like this:

for ( int i = a; ;  ) {

      // small amount of calculations, and conditional calls of continue;

  // if ( expression ) continue; // don't need this anymore

    // calculations1

  if ( __builtin_expect( !!(expression2), 0 ) ) { // suppose expression2 == false
    // calculations2
  }

   // very big amount calculations, and conditional calls of continue;

}

And have got desired 25% speed up. Even a little bit more. And behaviour no longer depends on the critical line.


I'm not sure how to explain it and can't find enough material on branch prediction.

But I guess the point is that calculations2 should be skipped, but compiler doesn't know about this and suppose expression2 == true by default. Meanwhile it suppose that in the simple continue-check

if ( expression ) continue;

expression == false, and nicely skips calculations2 as has to be done in any case. In case when under if we have more complicated operations (for example cout) it suppose that expression is true and the trick doesn't work.

If somebody knows materials, which can explain this behaviour without guesses I will be very glad to read and accept their answer.

like image 100
klm123 Avatar answered Nov 16 '22 03:11

klm123