Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do the gcc assembly output labels signify?

Tags:

gcc

assembly

I've written a simple C program test.c:

#include <stdio.h>
#include <stdlib.h>
int add(int a, int b);
int main()
{
    int i=5,j=10;
    int result;
    result = add(i, j);
    printf("result is %d\n", result);
}
int add(int a, int b)
{
    return (a + b);
}

and I compiled it:

gcc -S -Os -o test.s test.c 

and I get the assembly file test.s:

        .file   "test3.c"
    .section    .rodata
.LC0:
    .string "result is %d\n"
    .text
.globl main
    .type   main, @function
main:
.LFB5:
    pushq   %rbp
.LCFI0:
    movq    %rsp, %rbp
.LCFI1:
    subq    $16, %rsp
.LCFI2:
    movl    $5, -12(%rbp)
    movl    $10, -8(%rbp)
    movl    -8(%rbp), %esi
    movl    -12(%rbp), %edi
    call    add
    movl    %eax, -4(%rbp)
    movl    -4(%rbp), %esi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    leave
    ret
.LFE5:
    .size   main, .-main
.globl add
    .type   add, @function
add:
.LFB6:
    pushq   %rbp
.LCFI3:
    movq    %rsp, %rbp
.LCFI4:
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -8(%rbp), %eax
    addl    -4(%rbp), %eax
    leave
    ret
.LFE6:
    .size   add, .-add
    .section    .eh_frame,"a",@progbits
.Lframe1:
    .long   .LECIE1-.LSCIE1
.LSCIE1:
    .long   0x0
    .byte   0x1
    .string "zR"
    .uleb128 0x1
    .sleb128 -8
    .byte   0x10
    .uleb128 0x1
    .byte   0x3
    .byte   0xc
    .uleb128 0x7
    .uleb128 0x8
    .byte   0x90
    .uleb128 0x1
    .align 8
.LECIE1:
.LSFDE1:
    .long   .LEFDE1-.LASFDE1
.LASFDE1:
    .long   .LASFDE1-.Lframe1
    .long   .LFB5
    .long   .LFE5-.LFB5
    .uleb128 0x0
    .byte   0x4
    .long   .LCFI0-.LFB5
    .byte   0xe
    .uleb128 0x10
    .byte   0x86
    .uleb128 0x2
    .byte   0x4
    .long   .LCFI1-.LCFI0
    .byte   0xd
    .uleb128 0x6
    .align 8
.LEFDE1:
.LSFDE3:
    .long   .LEFDE3-.LASFDE3
.LASFDE3:
    .long   .LASFDE3-.Lframe1
    .long   .LFB6
    .long   .LFE6-.LFB6
    .uleb128 0x0
    .byte   0x4
    .long   .LCFI3-.LFB6
    .byte   0xe
    .uleb128 0x10
    .byte   0x86
    .uleb128 0x2
    .byte   0x4
    .long   .LCFI4-.LCFI3
    .byte   0xd
    .uleb128 0x6
    .align 8
.LEFDE3:
    .ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-48)"
    .section    .note.GNU-stack,"",@progbits

I understand all these instructions, but I really don't understand what these labels mean. .LC0, .LFB5, .LCFI0, .LCFI1, .LCFI2, .LFE5, ... These labels are generated automatically by gcc. Why does it need these labels? It seems that some labels are redundant.

  • gcc version: 4.1.2
  • machine: x86_64
like image 799
Jak.Ding Avatar asked Mar 21 '12 06:03

Jak.Ding


People also ask

Does GCC output assembly?

Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.

What assembly does GCC use?

The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software.

What does loc mean in assembly?

loc means "line of code".


2 Answers

The compiler will generate a label for any place it needs to refer to an address, whether it be for a jump or branch instruction, or for a data location.

The compiler has no need to create intuitively named labels since they are only referenced by code it generates and has no end-user visibility, so it generates more-or-less sequentially named labels, with a scheme to prevent accidentally creating the same label for two different locations.

There is absolutely no disadvantages to labelling the same location with two (or more) labels, so there is no attempt to avoid that. That is why there are a few locations with two sequential labels with no intervening ops.

If you really want to know what the, for example, LCx and LFBx series of labels mean, read the compiler source code. This is a non-trivial code base, so expect to spend hours just looking for the relevant module.


I rose to the challenge, so—having some compiler writing experience—I found module /trunk/gcc/dwarf2out.c which seems to generate label names using the same strategy. Look around line 250 for terse clues about what the symbols mean. Much of this module determines the labels, but it is nearly 23,000 lines long, so it could well test your curiosity.

like image 90
wallyk Avatar answered Oct 06 '22 00:10

wallyk


Try gcc -fverbose-asm -fdump-tree-all -S -Os -o test.s test.c to get much more informations, notably many "dump" files test.c.* containing a partial view of GCC internal representations.

Don't be bothered by apparently useless labels. I guess that GCC could generate one for each basic block.

Recall that GCC is working a lot on internal representations (Gimple, Tree) notably. Optimization passes (there are hundreds of them) are modifying these internal representations significantly. Most optimizations are in the middle-end, working on Gimple etc...

My slides on http://gcc-melt.org/ have a bit more detailed explanations (and you can find many others on the web).

Consider using MELT (a domain specific language to extend GCC 4.6 or later) to explore (or even modify) the internal GCC representations. MELT is very well suited for that goal.


NB:

your gcc-4.1 is several years old. GCC 4.7 has just been released (actually 4.7.0 second release candidate). And GCC made a lot of progress since 4.1 (appeared in 2006). You really should use newer versions (4.6 at least) if you care about optimizations. You can ask questions about GCC internals on [email protected] (lists for those developing or hacking the compiler), but most GCC contributors forgot the details of 4.1. Use [email protected] for general help about GCC (i.e. how to build or use it).

like image 29
Basile Starynkevitch Avatar answered Oct 05 '22 23:10

Basile Starynkevitch