Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How exactly does gcc do optimizations?

In order to know how exactly the gcc do the optimization, I have written two program compiling with -O2, but there is some difference of the assembly code. In my programs, I want to output "hello" in the loop, and add some delay between each output. These two programs are only for illustrating my question, and I know I can using volatile or asm in program 1 to achieve my purpose.

Program 1

#include <stdio.h>

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (++i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
    }
}

Compile with -O2, the assembly code is:

Disassembly of section .text.startup:

00000000 <_main>:
#include <stdio.h>

int main(int argc, char **argv)
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   66 90                   xchg   %ax,%ax
  10:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  17:   e8 00 00 00 00          call   1c <_main+0x1c>
  1c:   eb f2                   jmp    10 <_main+0x10>
  1e:   90                      nop
  1f:   90                      nop

Program 2

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
        i++;
    }
}

Compile with -O2, the assembly code is:

Disassembly of section .text.startup:

00000000 <_main>:
#include <stdio.h>

int main(int argc, char **argv)
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   31 c0                   xor    %eax,%eax
  10:   83 c0 01                add    $0x1,%eax
  13:   3d ff ff ff 1f          cmp    $0x1fffffff,%eax
  18:   76 f6                   jbe    10 <_main+0x10>
  1a:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
    while (1) {
        if (i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
        i++;
  21:   e8 00 00 00 00          call   26 <_main+0x26>

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (i > 0x1fffffffUL) {
  26:   31 c0                   xor    %eax,%eax
  28:   eb e6                   jmp    10 <_main+0x10>
            printf("hello\n");
  2a:   90                      nop
  2b:   90                      nop
  2c:   90                      nop
  2d:   90                      nop
  2e:   90                      nop
  2f:   90                      nop

In program 1, the increase of i is optimized out, but it's not in program 2. Why this happens? What rules is gcc using when optimizing with -O2 for these two programs?

like image 949
Ivan Avatar asked Mar 01 '16 10:03

Ivan


People also ask

What optimization does GCC do?

The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.

How do compiler optimizations work?

Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.

How does GCC compiler work?

After the file is preprocessed, gcc moves it to the compiler. The compiler turns each line in the preprocessed file into assembly language, which are instructions in English mnemonics that have strong one-to-one correspondence to the machine code that computers can read.

How does the C++ compiler optimize?

The C/C++ compiler compiles each source file separately and produces the corresponding object file. This means the compiler can only apply optimizations on a single source file rather than on the whole program. However, some important optimizations can be performed only by looking at the whole program.


2 Answers

Asking "why" about optimizers is usually a waste of time, because there are no "rules" by which optimizers operate -- other than "as if": The optimizer may not change the observable behaviour of conforming code.

The "observable behaviour" of both your programs is to print "hello" repeatedly.

In your first program, the counting is optimized away, making the observable behaviour happen faster. That is the job of an optimizer. Be happy your code is more efficient now!

In your second program, the counting is not optimized away, because somehow the optimizer -- in this version of this compiler with this setting -- did not see that it could do without it. Why? Who knows (other than the maintainer of the compiler's optimizer module)?

If your desired behaviour is to have a delay between outputs, use something like thrd_sleep(). Empty count loops were a way to delay BASIC 2.0 programs on the C64, but they should not be used in C, for the exact reason you just observed: You never know what the optimizer does.

like image 102
DevSolar Avatar answered Sep 23 '22 00:09

DevSolar


The branching in the if statement now depends on something that happened in the previous iteration of the loop. In particular, the compiler can easily determine in program 1 that i is incremented in every iteration of the while loop (as it is right at the top), while this is not the case in program 2.

Anyway, compiler optimizations are very complicated. See below:

gcc -O2 is a shortcut for these flags: (from the documentation)

      -fauto-inc-dec 
      -fbranch-count-reg 
      -fcombine-stack-adjustments 
      -fcompare-elim 
      -fcprop-registers 
      -fdce 
      -fdefer-pop 
      -fdelayed-branch 
      -fdse 
      -fforward-propagate 
      -fguess-branch-probability 
      -fif-conversion2 
      -fif-conversion 
      -finline-functions-called-once 
      -fipa-pure-const 
      -fipa-profile 
      -fipa-reference 
      -fmerge-constants 
      -fmove-loop-invariants 
      -freorder-blocks 
      -fshrink-wrap 
      -fsplit-wide-types 
      -fssa-backprop 
      -fssa-phiopt 
      -ftree-bit-ccp 
      -ftree-ccp 
      -ftree-ch 
      -ftree-coalesce-vars 
      -ftree-copy-prop 
      -ftree-dce 
      -ftree-dominator-opts 
      -ftree-dse 
      -ftree-forwprop 
      -ftree-fre 
      -ftree-phiprop 
      -ftree-sink 
      -ftree-slsr 
      -ftree-sra 
      -ftree-pta 
      -ftree-ter 
      -funit-at-a-time
      -fthread-jumps 
      -falign-functions  -falign-jumps 
      -falign-loops  -falign-labels 
      -fcaller-saves 
      -fcrossjumping 
      -fcse-follow-jumps  -fcse-skip-blocks 
      -fdelete-null-pointer-checks 
      -fdevirtualize -fdevirtualize-speculatively 
      -fexpensive-optimizations 
      -fgcse  -fgcse-lm  
      -fhoist-adjacent-loads 
      -finline-small-functions 
      -findirect-inlining 
      -fipa-cp 
      -fipa-cp-alignment 
      -fipa-sra 
      -fipa-icf 
      -fisolate-erroneous-paths-dereference 
      -flra-remat 
      -foptimize-sibling-calls 
      -foptimize-strlen 
      -fpartial-inlining 
      -fpeephole2 
      -freorder-blocks-algorithm=stc 
      -freorder-blocks-and-partition -freorder-functions 
      -frerun-cse-after-loop  
      -fsched-interblock  -fsched-spec 
      -fschedule-insns  -fschedule-insns2 
      -fstrict-aliasing -fstrict-overflow 
      -ftree-builtin-call-dce 
      -ftree-switch-conversion -ftree-tail-merge 
      -ftree-pre 
      -ftree-vrp 
      -fipa-ra

Each of these flags corresponds to a different possible optimization the compiler is allowed to make.

like image 23
Chris Avatar answered Sep 23 '22 00:09

Chris