I've written a simple C program test.c
:
#include <stdio.h>
#include <stdlib.h>
int add(int a, int b);
int main()
{
int i=5,j=10;
int result;
result = add(i, j);
printf("result is %d\n", result);
}
int add(int a, int b)
{
return (a + b);
}
and I compiled it:
gcc -S -Os -o test.s test.c
and I get the assembly file test.s
:
.file "test3.c"
.section .rodata
.LC0:
.string "result is %d\n"
.text
.globl main
.type main, @function
main:
.LFB5:
pushq %rbp
.LCFI0:
movq %rsp, %rbp
.LCFI1:
subq $16, %rsp
.LCFI2:
movl $5, -12(%rbp)
movl $10, -8(%rbp)
movl -8(%rbp), %esi
movl -12(%rbp), %edi
call add
movl %eax, -4(%rbp)
movl -4(%rbp), %esi
movl $.LC0, %edi
movl $0, %eax
call printf
leave
ret
.LFE5:
.size main, .-main
.globl add
.type add, @function
add:
.LFB6:
pushq %rbp
.LCFI3:
movq %rsp, %rbp
.LCFI4:
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -8(%rbp), %eax
addl -4(%rbp), %eax
leave
ret
.LFE6:
.size add, .-add
.section .eh_frame,"a",@progbits
.Lframe1:
.long .LECIE1-.LSCIE1
.LSCIE1:
.long 0x0
.byte 0x1
.string "zR"
.uleb128 0x1
.sleb128 -8
.byte 0x10
.uleb128 0x1
.byte 0x3
.byte 0xc
.uleb128 0x7
.uleb128 0x8
.byte 0x90
.uleb128 0x1
.align 8
.LECIE1:
.LSFDE1:
.long .LEFDE1-.LASFDE1
.LASFDE1:
.long .LASFDE1-.Lframe1
.long .LFB5
.long .LFE5-.LFB5
.uleb128 0x0
.byte 0x4
.long .LCFI0-.LFB5
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI1-.LCFI0
.byte 0xd
.uleb128 0x6
.align 8
.LEFDE1:
.LSFDE3:
.long .LEFDE3-.LASFDE3
.LASFDE3:
.long .LASFDE3-.Lframe1
.long .LFB6
.long .LFE6-.LFB6
.uleb128 0x0
.byte 0x4
.long .LCFI3-.LFB6
.byte 0xe
.uleb128 0x10
.byte 0x86
.uleb128 0x2
.byte 0x4
.long .LCFI4-.LCFI3
.byte 0xd
.uleb128 0x6
.align 8
.LEFDE3:
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-48)"
.section .note.GNU-stack,"",@progbits
I understand all these instructions, but I really don't understand what these labels mean. .LC0
, .LFB5
, .LCFI0
, .LCFI1
, .LCFI2
, .LFE5
, ... These labels are generated automatically by gcc. Why does it need these labels? It seems that some labels are redundant.
Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.
The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software.
loc means "line of code".
The compiler will generate a label for any place it needs to refer to an address, whether it be for a jump or branch instruction, or for a data location.
The compiler has no need to create intuitively named labels since they are only referenced by code it generates and has no end-user visibility, so it generates more-or-less sequentially named labels, with a scheme to prevent accidentally creating the same label for two different locations.
There is absolutely no disadvantages to labelling the same location with two (or more) labels, so there is no attempt to avoid that. That is why there are a few locations with two sequential labels with no intervening ops.
If you really want to know what the, for example, LCx
and LFBx
series of labels mean, read the compiler source code. This is a non-trivial code base, so expect to spend hours just looking for the relevant module.
I rose to the challenge, so—having some compiler writing experience—I found module /trunk/gcc/dwarf2out.c
which seems to generate label names using the same strategy. Look around line 250 for terse clues about what the symbols mean. Much of this module determines the labels, but it is nearly 23,000 lines long, so it could well test your curiosity.
Try gcc -fverbose-asm -fdump-tree-all -S -Os -o test.s test.c
to get much more informations, notably many "dump" files test.c.*
containing a partial view of GCC internal representations.
Don't be bothered by apparently useless labels. I guess that GCC could generate one for each basic block.
Recall that GCC is working a lot on internal representations (Gimple, Tree) notably. Optimization passes (there are hundreds of them) are modifying these internal representations significantly. Most optimizations are in the middle-end, working on Gimple etc...
My slides on http://gcc-melt.org/ have a bit more detailed explanations (and you can find many others on the web).
Consider using MELT (a domain specific language to extend GCC 4.6 or later) to explore (or even modify) the internal GCC representations. MELT is very well suited for that goal.
your gcc-4.1
is several years old. GCC 4.7 has just been released (actually 4.7.0 second release candidate). And GCC made a lot of progress since 4.1 (appeared in 2006). You really should use newer versions (4.6 at least) if you care about optimizations. You can ask questions about GCC internals on [email protected]
(lists for those developing or hacking the compiler), but most GCC contributors forgot the details of 4.1. Use [email protected]
for general help about GCC (i.e. how to build or use it).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With