EDIT:
What I want is to distinguish statically linked library functions and user self-written functions within a compiled file (e.g. PE file).
How to achieve that? (I am thinking of database comparison but I do not know any database.)
By the way, (I have already known long before I asked this question) for dynamically linked library functions, they are just an entry in the import table (of PE).
By library functions, I mean those defined in libraries, such as STL (I know this is a bad name).
By user-defined functions, I mean those written by individual programmers.
Is there any programmatic way to achieve this goal?
Right now I am thinking about comparing binaries with a database, but I do not know any database so far.
Please recommend a database or a different way as an answer. Thank you.
This answer is assuming you want to analyze a standard Windows executable that is dynamically linked against other import libraries (.lib and assoicated .dll files that are not statically linked), and if this is the case, you want to interperet the PE (Portable Executable) file structure.
Here's a good article to get you started, with sample code on dumping the PE header.
You will want to focus on the Import table (.idata section) for external library calls, and the Export table (.edata section) for calls defined inside the executable and marked as exportable (usually this only exists in .dll files).
For static libraries, their format is called COFF, and there is the DUMPBIN utility that ships with Visual Studio that you can use to quickly peer into your lib files and even dump the disassembly of the code if you wanted.
The DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities. The combination of these tools features the ability to provide information about the format and symbols provided in executable, library, and DLL files.
For information on the structure of COFF files, see this article.
Figuring out if a function call is from a lib or not would be tricky, but from what I remember, most static lib calls in code are actually thunk calls (simple jmp calls to the actual object code copied in from the lib) and are small in size (usually around 5 bytes), while "user defined" ones are not thunks, and are bp-based framed calls.
When your programm is linked, static functions and user-define functions are include file by file.
So if you dump the header of a PE file, and look at the symbols
table (using objdump -x if you run with mingw32, or anything else)
you will see the name of a file and then all functions import from this one,
after an other file name and its functions...
Or if you have debug information, may be this can be easier.
So after you link functions with a file you can sort the functions by analysing their file name. Looking for extention (.c / .lib / .a) or check in a list of file you have somwhere. Be carefull to eliminate crt0 files...
However this is kind a tricky solution and I'm not sure this'll work for every program.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With