Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between exit() and return in main() function in C

Tags:

c

I've looked through the links What is the difference between exit and return? and return statement vs exit() in main() to find the answer, but in vain.

Problem with the first link is that the answer assumes return from any function. I want to know the exact difference between the two when in main() function. Even if there's a little difference I'd like to know what it is. Which is preferred and why? Is there any performance gain in using return over exit() (or exit() over return) with all sorts of compiler optimizations turned off?

Problem with the second link is I'm not interested in knowing what happens in C++. I want the answer specifically pertaining to C.

EDIT: After recommendation by a person, I actually tried to compare the assembly output of the following programs:

Note: Using gcc -S <myprogram>.c

Program mainf.c:

int main(void){
 return 0;
}

Assembly output:

    .file   "mainf.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.9.2-10ubuntu13) 4.9.2"
    .section    .note.GNU-stack,"",@progbits

Program mainf1.c:

#include <stdlib.h>

int main(void){
 exit(0);
}

Assembly output:

    .file   "mainf1.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $0, %edi
    call    exit
    .cfi_endproc
.LFE2:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.9.2-10ubuntu13) 4.9.2"
    .section    .note.GNU-stack,"",@progbits

Noting that I'm not well versed with assembly, I can see some differences between the 2 programs with the exit() version being shorter than return version. What's the difference?

like image 719
Mayank Verma Avatar asked Feb 18 '16 02:02

Mayank Verma


2 Answers

There is practically no difference between calling exit or executing return from main as long as main returns a type that is compatible with int.

From the C11 Standard:

5.1.2.2.3 Program termination

1 If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0. If the return type is not compatible with int, the termination status returned to the host environment is unspecified.

like image 53
R Sahu Avatar answered Sep 22 '22 20:09

R Sahu


Disclaimer: This answer does not quote the C Standards.

TL;DR

Both the methods jump into GLibC code, and to know exactly what that code is doing or which one is faster or more efficient, you'll need to read them. If you want to know more about the GLibC, you should check the sources for the GCC and GLibC. There are links in the end for those.


Syscalls, wrappers and GLibC

First: there's a difference between exit(3) and _exit(2). The first is a GLibC wrapper around the second, which is a system call. The one we use in our program, and requires the inclusion of stdlib.h is exit(3) - the GLibC wrapper, not the system call.

Now, programs are not just your simple instructions. They contain heavy loads of GLibC's own instructions. These GLibC functions serve several purposes related to loading and providing the library functionality you use. For that to work GLibC must be "inside" your program.

So, how is GLibC inside your program? Well, it puts itself there through your compiler (it sets some static code and some hooks into the dynamic library) - most likely you're using gcc.


The 'return 0;' method

I suppose you know what stack frames are, so I won't explain what they are. The cool thing to notice is that main() itself has it's own stack frame. And that stack frame returns somewhere and it must return... But, to where?

Lets compile the following:

int main(void)
{
        return 0;
}

And compile and debug it with:

$ gcc -o main main.c

$ gdb main

(gdb) disass main
Dump of assembler code for function main:
0x00000000004005e8 <+0>:     push   %rbp
0x00000000004005e9 <+1>:     mov    %rsp,%rbp
0x00000000004005ec <+4>:     mov    $0x0,%eax
0x00000000004005f1 <+9>:     pop    %rbp
0x00000000004005f2 <+10>:    retq
End of assembler dump.

(gdb) break main
(gdb) run 
Breakpoint 1, 0x00000000004005ec in main ()  
(gdb) stepi
...

Now, stepi will make for the fun part. This will jump one instruction at a time, so it's perfect to follow function calls. After you press run stepi for the first time, just hold your finger on ENTER until you get tired.

What you must observe is the sequence in which functions are called with this method. You see, ret is a "jumping" instruction (edit: after David Hoelzer comment, I see that calling ret a simple jump is an over-generalization): after we pop rbp, ret itself will pop the return pointer from the stack and jump to it. So, if GLibC built that stack frame, retq is making our return 0; C statement jump right into GLibC's own code! How clever!

The order of function calls I got started roughly like this:

__libc_start_main
exit
__run_exit_handlers
_dl_fini
rtld_lock_default_lock_recursive
_dl_fini
_dl_sort_fini

The 'exit(0);' method

Compiling this:

#include <stdlib.h>
int main(void)
{
        exit(0);
}

And compiling and debugging...

$ gcc -o exit exit.c

$ gdb exit
(gdb) disass main
Dump of assembler code for function main:
0x0000000000400628 <+0>:     push   %rbp
0x0000000000400629 <+1>:     mov    %rsp,%rbp
0x000000000040062c <+4>:     mov    $0x0,%edi
0x0000000000400631 <+9>:     callq  0x4004d0 <exit@plt>
End of assembler dump.
(gdb) break main
(gdb) run
Breakpoint 1, 0x000000000040062c in main ()
(gdb) stepi
...

And the function sequence I got was:

exit@plt
??
_dl_runtime_resolve
_dl_fixup
_dl_lookup_symbol_x
do_lookup_x
check_match
_dl_name_match
strcmp

List object's Symbols

There's a cool tool for printing the symbols defined within a binary. It's nm. I suggest you take a look into it as it will give you an idea of how much "crap" it's added in a simple program like the ones above.

To use it in the simplest form:

$ nm main
$ nm exit

That will print a list of symbols in the file. Note that this list does not include references these functions will make. So if a given function in this list calls another function, the other probably won't be in the list.


Conclusion

It depends heavily on the way the GLibC choses to handle a simple stack frame return from main and how it implements the exit wrapper. In the end, the _exit(2) system call will get called and you'll exit your process.

Finally, to really answer your question: both the methods jump into GLibC code, and to know exactly what that code is doing you'll need to read it. If you want to know more about the GLibC, you should check the sources for the GCC and GLibC.


References

  • GLibC Source Repository: Look in stdlib/exit.c and stdlib/exit.h for the implementations.
  • Linux Kernel Exit Definition: look in kernel/exit.c for the _exit(2) system call implementation, and include/syscalls.h for the preprocessor magic behind it.
  • GCC Sources: I do not know the gcc (compiler, not suite) sources, and would appreciate if anyone could point out where the runtime sequence is defined.
like image 21
Enzo Ferber Avatar answered Sep 20 '22 20:09

Enzo Ferber