The test is on 32-bit x86
Linux.
So basically I am trying to log the information of executed basic blocks by insert instrumentation instructions in assembly code.
My strategy is like this: Write the index of a executed basic block in a globl array, and flush the array from memory to the disk when the array is full (16M).
Here is my problem. I need the flush the array to the disk when the execution of instrumented binary is over, even if it does not reach 16M boundary. However, I just don't know where to find the exit of a assembly
program.
I tried this:
grep exit
from the target assembly program, and flush the memory right before the call exit
instruction. But according to some debugging experience, the target C program, say, a md5sum
binary, does not call exit
when it finishes the execution.
Flush the memory at the end of main
function. However, in the assembly code, I just don't know where is the exact end of main
function. I can do a conservative approach, say, looking for all the ret
instruction, but it seems to me that not all the main
function ends with a ret
instruction.
So here is my question, how to identify the exact execution end of a assembly code
, and insert some instrumentation instructions there? Hooking some library code is fine to me. I understand with different input, binary could exit at different position, so I guess I need some conservative estimation. Am I clear? thanks!
The exit() function in C. The exit() function is used to terminate a process or function calling immediately in the program. It means any open file or function belonging to the process is closed immediately as the exit() function occurred in the program.
The purpose of the exit() function is to terminate the execution of a program. The “return 0”(or EXIT_SUCCESS) implies that the code has executed successfully without any error. Exit codes other than “0”(or EXIT_FAILURE) indicate the presence of an error in the code.
Exit Failure: Exit Failure is indicated by exit(1) which means the abnormal termination of the program, i.e. some error or interrupt has occurred. We can use different integer other than 1 to indicate different types of errors. #include <file.h>
I believe you cannot do that in the general case. First, if main
is returning some code, it is an exit code (if main
has no explicit return
the recent C standards require that the compiler adds an implicit return 0;
). Then a function could store the address of exit
in some data (e.g. a global function, a field in a struct
, ...), and some other function could indrectly call that thru a function pointer. Practically, a program can load some plugins using dlopen
and use dlsym
for "exit"
name, or simply call exit
inside the plugin, etc... AFAIU solving that problem (of finding actual exit
calls, in the dynamic sense) in full generality can be proved equivalent to the halting problem. See also Rice's theorem.
Without claiming an exhaustive approach, I would suggest something else (assuming you are interested in instrumenting programs coded in C or C++, etc... whose source code is available to you). You could customize the GCC compiler with MELT to change the basic blocks processed inside GCC to call some of your instrumentation functions. It is not trivial, but it is doable... Of course you'll need to recompile some C code with such a customized GCC to instrument it.
(Disclaimer, I am the main author of MELT; feel free to contact me for more...)
BTW, do you know about atexit(3)? It could be helpful for your flushing issue... And you might also use LD_PRELOAD
tricks (read about dynamic linkers, see ld-linux(8)).
atexit()
will properly handle 95+% of programs. You can either modify its chain of registered handlers, or instrument it as you are other blocks. However, some programs may terminate by use of _exit()
which does not invoke atexit handlers. Probably instrumenting _exit to invoke data flushing and installing an atexit (or on_exit()
on BSD-like programs) handler should cover nearly 100% of programs.
Addendum: Note that the Linux Base Specification says that the C library startup shall:
call the initializer function (*init)().
call main() with appropriate arguments.
call exit() with the return value from main().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With