Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does gcc optimize my cycle with condition?

I have the following cycle:

//condition will be set here to true or false

for (int i = 0; i < LARGE_NUMBER; i++) {
    if (condition) {
        //do foo
    } else {
        //do bar
    }
}

Assumption: A cycle if faster without a condition than with a condition. (Is this true?) Question: Will gcc factor out my if, if condition has been set outside the for cycle, and the cycle itself doesn't touch condition?

If not, I should switch the if and the for, duplicate code, violate DRY, etc.

like image 810
erenon Avatar asked Apr 02 '11 11:04

erenon


3 Answers

For those who don't wish to read a lengthy post, this optimization is called (in LLVM) Loop Unswitch.

Why not ask a compiler ?

void foo(char* c);

int main(int argc, char **argv) {
  bool const condition = argc % 2;

  for (int i = 0; i != argc; ++i) {
    if (condition) {
      foo(argv[1]);
    } else {
      foo(argv[0]);
    }
  }
  return 0; 
}

Is transformed into SSA form (via LLVM try out):

define i32 @main(i32 %argc, i8** nocapture %argv) {
entry:
  %0 = icmp eq i32 %argc, 0                       ; <i1> [#uses=1]
  br i1 %0, label %bb5, label %bb.nph

bb.nph:                                           ; preds = %entry
  %1 = and i32 %argc, 1                           ; <i32> [#uses=1]
  %toBool = icmp eq i32 %1, 0                     ; <i1> [#uses=1]
  %2 = getelementptr inbounds i8** %argv, i64 1   ; <i8**> [#uses=1]
  br i1 %toBool, label %bb3.us, label %bb3

bb3.us:                                           ; preds = %bb3.us, %bb.nph
  %i.07.us = phi i32 [ %4, %bb3.us ], [ 0, %bb.nph ] ; <i32> [#uses=1]
  %3 = load i8** %argv, align 8                   ; <i8*> [#uses=1]
  tail call void @_Z3fooPc(i8* %3)
  %4 = add nsw i32 %i.07.us, 1                    ; <i32> [#uses=2]
  %exitcond = icmp eq i32 %4, %argc               ; <i1> [#uses=1]
  br i1 %exitcond, label %bb5, label %bb3.us

bb3:                                              ; preds = %bb3, %bb.nph
  %i.07 = phi i32 [ %6, %bb3 ], [ 0, %bb.nph ]    ; <i32> [#uses=1]
  %5 = load i8** %2, align 8                      ; <i8*> [#uses=1]
  tail call void @_Z3fooPc(i8* %5)
  %6 = add nsw i32 %i.07, 1                       ; <i32> [#uses=2]
  %exitcond8 = icmp eq i32 %6, %argc              ; <i1> [#uses=1]
  br i1 %exitcond8, label %bb5, label %bb3

bb5:                                              ; preds = %bb3, %bb3.us, %entry
  ret i32 0
}

Not too readable perhaps, so let me point out what's here:

  • entry: check if argc is equal to 0, if it is, go to bb5 (exit) else go to bb.nph
  • bb.nph: compute the value of condition, if it's true, go to bb3.us else go to bb3
  • bb3.us and bb3: loops for the true and false condition respectively
  • bb5: exit

A compiler can pretty much transform your code how it wants, as long as the effect is similar to what you asked for. In this case, it has effectively rewritten the code as:

int main(int argc, char**argv) {
  if (argc != 0)
  {
    int i = 0;
    if (argc % 2) {
      do {
        foo(argv[1]);
        ++i;
      } while (i != argc);
    } else {
      do {
        foo(argv[0]);
        ++i;
      } while (i != argc);
    }
  }
  return 0;
}

It's a form of Loop Invariant Optimization, combined here with a first check to avoid computing the condition if the loop is not going to get executed.

For those of us who would think the first solution is clearer, we're quite happy to have the compiler do the nitty gritty optimization for us!

like image 190
Matthieu M. Avatar answered Nov 06 '22 13:11

Matthieu M.


Any decent optimizing compiler will do this if condition can be proved to not change during the iteration.

Furthermore, even if the compiler does not in fact do this, you would do well to support your decision to rewrite the code to a less-human-readable form with hard data from profiling. Don't optimize prematurely. It isn't worth it to give readers of the code a "huh?" moment in order to shave off a few milliseconds (and "readers" definitely includes yourself at a future time).

like image 23
Jon Avatar answered Nov 06 '22 15:11

Jon


I wouldn't advocate to take any action here, by the usual "premature optimization" arguments. Keeping the code clear is most important, and if the whole program is too slow, you may want to profile and find the actual bottlenecks (that you usually can't guess) after the program has been thoroughly debugged.

Even if the compiler does not optimize this particular case for you, you may want to know that CPU do some form of branch prediction which will greatly reduce the time needed to process the condition in case the condition is predictable.

Indeed, most CPU process instructions in a pipeline and by the time the jump address has to be determined, the condition variable may be unknown. This would cause a pipeline stall, and this is where most modern processors try to guess (smartly in fact) where the program will jump. If the condition variable is indeed known (as it is in your case), the guess will be perfect.

So I doubt that even with a "dumb" compiler, you would actually see a difference between the two options, at least on modern machines.

like image 5
Alexandre C. Avatar answered Nov 06 '22 14:11

Alexandre C.