I would like to know if anyone can help me out with a problem I am having when studying one of the lecture slides from an introductory assembly class that I am taking in school. The problem I am having is not understanding the assembly, it is how exactly the C source code is ordered based on the assembly. I will post the snippet I am talking about and maybe it will be clearer what I am talking about. C Source given: <pre class="prettyprint"><code>int arith(int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; } </code></pre> Assembly given: <pre class="prettyprint"><code>arith: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax movl 12(%ebp),%edx leal (%edx,%eax),%ecx leal (%edx,%edx,2),%edx sall $4,%edx addl 16(%ebp),%ecx leal 4(%edx,%eax),%eax imull %ecx,%eax movl %ebp,%esp popl %ebp ret </code></pre> I am just confused as to how I am supposed to be able to discern for example that the adding of <code>z + t1</code> (<code>z + x + y</code>) is listed on the second line(in the source) when in the assembly it comes after the <code>y * 48</code> in the assembly code or for example that <code>x + 4</code> is the 3rd line when in the assembly it is not even in a line by itself, its sort of mixed in with the last <code>leal</code> statement. It makes sense to me when I have the source but I am supposed to be able to reproduce the source for a test and I do understand that the compiler optimizes things but if anyone has a way of thinking about the reverse engineering that could help me out I would greatly appreciate it if they could walk me through their thought process. Thanks.

You can't reproduce the original source, you can only reproduce an equivalent source. In your case the calculation for <code>t2</code> can appear anywhere after <code>t1</code> and before <code>retval</code>. The source might even have been a single expression: <pre class="prettyprint"><code>return (x+y+z) * ((x+4) + (y * 48)); </code></pre>

Decompilation is not perfectly achievable: there is some knowledge loss when going from the source code (where comments & names gave you a clue of the original programmer's intent) to binary machine code (where instructions are to be executed by the processor).

Reverse engineering C-source code from assembly

Tags:

c

assembly

reverse-engineering

I would like to know if anyone can help me out with a problem I am having when studying one of the lecture slides from an introductory assembly class that I am taking in school. The problem I am having is not understanding the assembly, it is how exactly the C source code is ordered based on the assembly. I will post the snippet I am talking about and maybe it will be clearer what I am talking about.

C Source given:

int arith(int x, int y, int z)
{ 
   int t1 = x+y;
   int t2 = z+t1;
   int t3 = x+4;
   int t4 = y * 48; 
   int t5 = t3 + t4;
   int rval = t2 * t5;
   return rval;
}

Assembly given:

arith:
pushl %ebp
movl %esp,%ebp

movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax

movl %ebp,%esp
popl %ebp
ret

I am just confused as to how I am supposed to be able to discern for example that the adding of z + t1 (z + x + y) is listed on the second line(in the source) when in the assembly it comes after the y * 48 in the assembly code or for example that x + 4 is the 3rd line when in the assembly it is not even in a line by itself, its sort of mixed in with the last leal statement. It makes sense to me when I have the source but I am supposed to be able to reproduce the source for a test and I do understand that the compiler optimizes things but if anyone has a way of thinking about the reverse engineering that could help me out I would greatly appreciate it if they could walk me through their thought process.

Thanks.

791

asked Nov 13 '11 18:11

cpowel2

4 Answers

I've broken down the disassembly for you to show how the assembly was produced from the C source.

8(%ebp) = x, 12(%ebp) = y, 16(%ebp) = z

arith:

Create the stack frame:

pushl %ebp
movl %esp,%ebp

Move x into eax, y into edx:

movl 8(%ebp),%eax
movl 12(%ebp),%edx

t1 = x + y. leal (Load effective address) will add edx and eax, and t1 will be in ecx:

leal (%edx,%eax),%ecx

int t4 = y * 48; in two steps below, multiply by 3, then by 16. t4 will eventually be in edx:

Multiply edx by 2, and add edx to the result, ie. edx = edx * 3:

leal (%edx,%edx,2),%edx

Shift left 4 bits, ie. multiply by 16:

sall $4,%edx

int t2 = z+t1;. ecx initially holds t1, z is at 16(%ebp), at the end of the instruction ecx will be holding t2:

addl 16(%ebp),%ecx

int t5 = t3 + t4;. t3 was simply x + 4, and rather than calculating and storing t3, the expression of t3 is placed inline. This instruction essential does (x+4) + t4, which is the same as t3 + t4. It adds edx (t4) and eax (x), and adds 4 as an offset to achieve that result.

leal 4(%edx,%eax),%eax

int rval = t2 * t5; Fairly straight-forward this one; ecx represents t2 and eax represents t5. The return value is passed back to the caller through eax.

imull %ecx,%eax

Destroy the stack frame and restore esp and ebp:

movl %ebp,%esp
popl %ebp

Return from the routine:

ret

From this example you can see that the result is the same, but the structure is a bit different. Most likely this code was compiled with some sort of optimization or someone wrote it themself like that to demonstrate a point.

As others have said, you can't go exactly back to the source from the disassembly. It's up to the interpretation of the person reading the assembly to come up with equivalent C code.

To help with learning assembly and understanding the disassembly of your C programs, you can do the following on Linux:

Compile with debug information (-g), which will embed the source:

gcc -c -g arith.c

If you're on a 64-bit machine, you can tell the compiler to create a 32-bit binary with the -m32 flag (I did so for the example below).

Use objdump to dump the object file with it's source interleaved:

objdump -d -S arith.o

-d = disassembly, -S = display source. You can add -M intel-mnemonic to use the Intel ASM syntax if you prefer that over the AT&T syntax that your example uses.

Output:

arith.o:     file format elf32-i386


Disassembly of section .text:

00000000 <arith>:
int arith(int x, int y, int z)
{ 
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 20                sub    $0x20,%esp
   int t1 = x+y;
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   8b 55 08                mov    0x8(%ebp),%edx
   c:   01 d0                   add    %edx,%eax
   e:   89 45 fc                mov    %eax,-0x4(%ebp)
   int t2 = z+t1;
  11:   8b 45 fc                mov    -0x4(%ebp),%eax
  14:   8b 55 10                mov    0x10(%ebp),%edx
  17:   01 d0                   add    %edx,%eax
  19:   89 45 f8                mov    %eax,-0x8(%ebp)
   int t3 = x+4;
  1c:   8b 45 08                mov    0x8(%ebp),%eax
  1f:   83 c0 04                add    $0x4,%eax
  22:   89 45 f4                mov    %eax,-0xc(%ebp)
   int t4 = y * 48; 
  25:   8b 55 0c                mov    0xc(%ebp),%edx
  28:   89 d0                   mov    %edx,%eax
  2a:   01 c0                   add    %eax,%eax
  2c:   01 d0                   add    %edx,%eax
  2e:   c1 e0 04                shl    $0x4,%eax
  31:   89 45 f0                mov    %eax,-0x10(%ebp)
   int t5 = t3 + t4;
  34:   8b 45 f0                mov    -0x10(%ebp),%eax
  37:   8b 55 f4                mov    -0xc(%ebp),%edx
  3a:   01 d0                   add    %edx,%eax
  3c:   89 45 ec                mov    %eax,-0x14(%ebp)
   int rval = t2 * t5;
  3f:   8b 45 f8                mov    -0x8(%ebp),%eax
  42:   0f af 45 ec             imul   -0x14(%ebp),%eax
  46:   89 45 e8                mov    %eax,-0x18(%ebp)
   return rval;
  49:   8b 45 e8                mov    -0x18(%ebp),%eax
}
  4c:   c9                      leave  
  4d:   c3                      ret

As you can see, without optimizations the compiler produces a larger binary than the example you have. You can play around with that and add a compiler optimization flag when compiling (ie. -O1, -O2, -O3). The higher the optimization level, the more abstract the disassembly's going to seem.

For example, with just level 1 optimization (gcc -c -g -O1 -m32 arith.c1), the assembly code produced is a lot shorter:

00000000 <arith>:
int arith(int x, int y, int z)
{ 
   0:   8b 4c 24 04             mov    0x4(%esp),%ecx
   4:   8b 54 24 08             mov    0x8(%esp),%edx
   int t1 = x+y;
   8:   8d 04 11                lea    (%ecx,%edx,1),%eax
   int t2 = z+t1;
   b:   03 44 24 0c             add    0xc(%esp),%eax
   int t3 = x+4;
   int t4 = y * 48; 
   f:   8d 14 52                lea    (%edx,%edx,2),%edx
  12:   c1 e2 04                shl    $0x4,%edx
   int t5 = t3 + t4;
  15:   8d 54 11 04             lea    0x4(%ecx,%edx,1),%edx
   int rval = t2 * t5;
  19:   0f af c2                imul   %edx,%eax
   return rval;
}
  1c:   c3                      ret

116

answered Oct 24 '22 17:10

AusCBloke

You can't reproduce the original source, you can only reproduce an equivalent source.

In your case the calculation for t2 can appear anywhere after t1 and before retval.

The source might even have been a single expression:

return (x+y+z) * ((x+4) + (y * 48));

answered Oct 24 '22 16:10

CodesInChaos

When reverse engineering, you don't care about the original source code line by line, you care about what it does. A side effect is that you see what the code does, not what the programmer intended the code to do.

answered Oct 24 '22 18:10

ninjalj

Decompilation is not perfectly achievable: there is some knowledge loss when going from the source code (where comments & names gave you a clue of the original programmer's intent) to binary machine code (where instructions are to be executed by the processor).

answered Oct 24 '22 16:10

Basile Starynkevitch

Related questions
                            
                                Optimizing for speed - 4 dimensional array lookup in C
                            
                                Is it a good idea to define a variable in a local block for a case of a switch statement?
                            
                                Warning: pointer of type 'void *' used in subtraction
                            
                                howto create a filesystem like /proc?
                            
                                How do I increment an enum in VS C++ 6.0?
                            
                                How to copy text file in C or C++?
                            
                                Which of these options is good practice for assigning a string value to a variable in C?
                            
                                What does the following code do?
                            
                                Making a char function parameter const?
                            
                                what integer width processed faster in C?
                            
                                C to C++ : Transitioning from one language to the other [duplicate]
                            
                                Parsing XML in Pure C
                            
                                C Question: why char actually occupies 4 bytes in memory?
                            
                                What is the # for when formatting using %s
                            
                                Multiplying two 3x3 matrices in C
                            
                                How to print both to stdout and file in C
                            
                                Random access of a large binary file
                            
                                How can I call notepad.exe from a C program?
                            
                                Precedence of Logical Operators in C [duplicate]
                            
                                Why does this (i = ++i % 3) generate a warning: "may be undefined"? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With