Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the "exit" of a C program

The test is on 32-bit x86 Linux.

So basically I am trying to log the information of executed basic blocks by insert instrumentation instructions in assembly code.

My strategy is like this: Write the index of a executed basic block in a globl array, and flush the array from memory to the disk when the array is full (16M).

Here is my problem. I need the flush the array to the disk when the execution of instrumented binary is over, even if it does not reach 16M boundary. However, I just don't know where to find the exit of a assembly program.

I tried this:

  1. grep exit from the target assembly program, and flush the memory right before the call exit instruction. But according to some debugging experience, the target C program, say, a md5sum binary, does not call exit when it finishes the execution.

  2. Flush the memory at the end of main function. However, in the assembly code, I just don't know where is the exact end of main function. I can do a conservative approach, say, looking for all the ret instruction, but it seems to me that not all the main function ends with a ret instruction.

So here is my question, how to identify the exact execution end of a assembly code , and insert some instrumentation instructions there? Hooking some library code is fine to me. I understand with different input, binary could exit at different position, so I guess I need some conservative estimation. Am I clear? thanks!

like image 248
lllllllllllll Avatar asked Jul 22 '15 16:07

lllllllllllll


People also ask

Is there an exit function in C?

The exit() function in C. The exit() function is used to terminate a process or function calling immediately in the program. It means any open file or function belonging to the process is closed immediately as the exit() function occurred in the program.

What is exit code in C?

The purpose of the exit() function is to terminate the execution of a program. The “return 0”(or EXIT_SUCCESS) implies that the code has executed successfully without any error. Exit codes other than “0”(or EXIT_FAILURE) indicate the presence of an error in the code.

What it means exit (- 1 in C?

Exit Failure: Exit Failure is indicated by exit(1) which means the abnormal termination of the program, i.e. some error or interrupt has occurred. We can use different integer other than 1 to indicate different types of errors. #include <file.h>


2 Answers

I believe you cannot do that in the general case. First, if main is returning some code, it is an exit code (if main has no explicit return the recent C standards require that the compiler adds an implicit return 0;). Then a function could store the address of exit in some data (e.g. a global function, a field in a struct, ...), and some other function could indrectly call that thru a function pointer. Practically, a program can load some plugins using dlopen and use dlsym for "exit" name, or simply call exit inside the plugin, etc... AFAIU solving that problem (of finding actual exit calls, in the dynamic sense) in full generality can be proved equivalent to the halting problem. See also Rice's theorem.

Without claiming an exhaustive approach, I would suggest something else (assuming you are interested in instrumenting programs coded in C or C++, etc... whose source code is available to you). You could customize the GCC compiler with MELT to change the basic blocks processed inside GCC to call some of your instrumentation functions. It is not trivial, but it is doable... Of course you'll need to recompile some C code with such a customized GCC to instrument it.

(Disclaimer, I am the main author of MELT; feel free to contact me for more...)

BTW, do you know about atexit(3)? It could be helpful for your flushing issue... And you might also use LD_PRELOAD tricks (read about dynamic linkers, see ld-linux(8)).

like image 185
Basile Starynkevitch Avatar answered Oct 11 '22 14:10

Basile Starynkevitch


atexit() will properly handle 95+% of programs. You can either modify its chain of registered handlers, or instrument it as you are other blocks. However, some programs may terminate by use of _exit() which does not invoke atexit handlers. Probably instrumenting _exit to invoke data flushing and installing an atexit (or on_exit() on BSD-like programs) handler should cover nearly 100% of programs.


Addendum: Note that the Linux Base Specification says that the C library startup shall:

call the initializer function (*init)().
call main() with appropriate arguments.
call exit() with the return value from main().

like image 21
wallyk Avatar answered Oct 11 '22 13:10

wallyk