Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reverse engineering C++

Today I decided to decompile a simple "Hello world" program written in visual C++, using IDA Pro.

With my previous knowledge I was sure I would not find the immediate call to printf at the executable entry point, and I was right. I found a lot of code that was not written by me and added by the compiler during the compilation process.

I would like a better understanding of what code is added during the compilation process. What does it do? Are there any "tricks" to quickly find "main" and skip all unnecessary code generated by disassembly?

The best I could find was in this post : http://www.codeproject.com/Articles/4210/C-Reverse-Disassembly, saying the execution order of an executable compiled using visual c++ is as follows:

  1. CrtlStartUp

  2. main

  3. CrtlCleanUp

Could I please get a more detailed answer?

like image 687
Michael Avatar asked Apr 19 '12 16:04

Michael


2 Answers

There are various things that are required by the c++ standard that you will likely encounter.

Most importantly is that there needs to be code that handles the construction of any statics in the main translation unit before main is called, and a function that after main leaves that handles their destruction. Additionally, the standard requires a function atexit that allows you to register additional functions to be called after main returns.

So at a minimum, the startup code needs to be able to build this data structure of functions that will be called on return from main. This is a dynamic data structure because it needs to be added to runtime by the program, and the order of calls is the opposite of registering (so typically you want a data structure that makes adding to the place you walk from easy).

But additionally, the standard requires that statics in other translation units are created before any function is executed in that translation unit. Often, compilers will simply arrange everything in the linker so it all get's called before main, but that is not required. Those compilers that do things differently, then need to provide thunks to initialisation routines in the other translation unit code linked that will called on first function call.

Just this is quite a bit of work if you use any standard library. Remember, std::cout is a static object (static lifetime, not static linkage - confusingly overloaded word alert). So that means building up communications to your console out, which will have whatever APIs needed by your platform called. There are many such objects in the standard.

And then, there may be stuff specific to your platform and/or compiler that prepares the process in some useful way, or parses environment variables, or loads "standard" dynamic/shared libraries, or similar stuff.

Typically, exit is just walking that list and somehow providing the return value of main to the environment, since most modern OSes clean up after themselves, but there may be system specific stuff in addition to that.

like image 198
ex0du5 Avatar answered Oct 19 '22 07:10

ex0du5


Today's compilers create massive executables so even if you'll find entrypoint it will take you a while to understand and get to the section you actually need.

In your case with hello world app you can use IDA find entry point in function list dialog (I don't remember exact name). But again I don't recommend this approach, unless app is very small.

The approach I'm using I call "down to top approach" (c)

I would start from analysing current behaviour of the application without involving any tools. It's very important step that will save lot of time as you'll know what are you looking for and when it happens. Then determine "weak points" like strings, constant values that you can find with static analysis tools (IDA).

Next step is to disassemble the app and look for those "weak points" (strings module in IDA) then find references to them from what functions they used (you can use graphics hierarchy view in new IDA versions)

If you still cant get how it's working or this code called from many places you don't know which you need. You can start with runtime analysis and use debugger (softice? :)) like ollydbg. This will show you things that not visible with static analysis like virtual functions / function pointers for example: call EAX.

Then you just processing step by step till you get what you need.

like image 32
victor.t Avatar answered Oct 19 '22 08:10

victor.t