How to divide disassembled C code to functions?

Tags:

I have an application which creates .text segment dumps of win32 processes. Then it divides the code on basic blocks. Basic block is a set of instructions which are executed always one after another (jumps are always the last instructions of such basic blocks). Here is an example:

Basic block 1
    mov ecx, dword ptr [ecx]
    test ecx, ecx
    je 00401013h

Basic block 2
    mov eax, dword ptr [ecx]
    call dword ptr [eax+08h]

Basic block 3
    test eax, eax
    je 0040100Ah

Basic block 4
    mov edx, dword ptr [eax]
    push 00000001h
    mov ecx, eax
    call dword ptr [edx]

Basic block 5
    ret 000008h

Now I would like to group such basic blocks in functions - say which basic blocks form a function. What's the algorithm? I have to remember that there might be many ret instructions inside one function. How to detect fast_call functions?

663

asked Feb 07 '13 16:02

Adam Sznajder

2 Answers

The simplest algorithm for grouping blocks into functions would be:

note all addresses to which calls are made with call some_address instructions
if the first block after such an address ends with ret, you're done with the function, else
follow the jump in the block to another block and so on until you've followed all possible execution paths (remember about conditional jumps, each of which splits a path into two) and all the paths have finished with ret. You'll need to recognize jumps that organize loops so your program itself does not hang by entering an infinite loop

Problems:

a number of calls can be made indirectly by reading function pointers from memory, e.g. you'd have call [some_address] instead of call some_address
some indirect calls can be made to calculated addresses
functions that call other functions before returning may have jump some_address instead of call some_address immediately followed by ret
call some_address can be simulated with a combination of push some_address + ret OR push some_address + jmp some_other_address
some functions may share code at their end (e.g. they have different entry points, but one or more exit points are the same)

You may use some heuristic to determine where functions start by looking for the most common prolog instruction sequence:

push ebp
mov ebp, esp

Again, this may not work if functions are compiled with the frame pointer suppressed (i.e. they'd use esp instead of ebp to access their parameters on the stack, it's possible).

The compiler (e.g. MSVC++) may also pad the inter-function space with the int 3 instruction and that too can serve as a hint for an upcoming function beginning.

As for differentiating between the various calling conventions, it's perhaps the easiest to look at the symbols (of course, if you have them). MSVC++ generates different name prefixes and suffixes, e.g.:

_function - cdecl
_function@number - stdcall
@function@number - fastcall

If you cannot extract this information from the symbols, you must analyze code to see how parameters are passed to functions and whether functions or their callers remove them from the stack.

answered Oct 23 '22 14:10

Alexey Frunze

You could use the presence of enter to denote the beginning of a function, or certain code which sets up a frame.

push ebp
mov  ebp, esp
sub  esp, (bytes for "local" stack space)

Later you'll find the opposite code (or leave) before a call to ret:

mov esp, ebp
pop ebp

You can also use the number of bytes for local stack space to identify local variables.

Identifying thiscall, fastcall, etc, will take some analysis of the code just prior to calls which use the initial location and an evaluation of the registers used/cleaned up.

answered Oct 23 '22 13:10

user7116

Related questions
                            
                                What is the meaning of second value in return statement in C
                            
                                C printf char more than or equal to 8 character in a row manner
                            
                                c/c++ strptime() does not parse %Z Timezone name
                            
                                realloc and buffer overflow
                            
                                How to set terminal background color on linux terminal without using ncurses?
                            
                                Find most common pair of characters in a string
                            
                                Do I need pthread_exit if I don't care of return value
                            
                                Time calculation with TSC (Time Stamp Counter)
                            
                                where are static buffers allocated?
                            
                                how to send signal with more information to other threads?
                            
                                Pointers as arguments in C functions
                            
                                How to do console input like in the "top" linux command?
                            
                                How to get octal chmod format from stat() in c
                            
                                What is the difference between fprintf and vfprintf in C++? [closed]
                            
                                fork() why not infinite output
                            
                                Find the system's line terminator
                            
                                Any difference between main thread and other threads?
                            
                                What is the meaning of a dot (.) after an integer in c?
                            
                                How to fail CMake gracefully when an include file does not exist?
                            
                                Finding out that the output of my program is redirected into a file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to divide disassembled C code to functions?

Tags:

c

windows

x86

assembly

Adam Sznajder

People also ask

2 Answers

Alexey Frunze

user7116

Recent Activity

Donate For Us