Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding assembly language

Tags:

c

assembly

I am trying to understand this assembly code in relation to the C code above. I am not sure if I am on the right track so maybe somebody can help me make a better understanding at this.

int silly(int n, int *p)
{
    int val, val2;
    if (n > 0)
        val2 = silly(n << 1, &val);
    else
        val = val2 = 0;
    *p = val + val2 + n;
    return val + val2;
}

This yields the following machine code:

silly:
pushl %ebp             // Here I am making space for the function on the stack
movl %esp,%ebp         // Moving the stack pointer where the base pointer is
subl $20,%esp          // Subtracting 20 from the stack pointer to allocate more space
pushl %ebx             // Pushing the %ebx register on top of the stack
movl 8(%ebp),%ebx      // Getting the first argument(which is n) and store it in register %ebx
testl %ebx,%ebx        // The first if-statement which compares if n > 0
jle .L3                // Jump if less or equal - meaning if n < 0 then jump to .L3
addl $-8,%esp          // Add -8 to %esp to allocate more space
leal -4(%ebp),%eax     // Storing the first local variable (which is val) in %eax
pushl %eax             // Pushing the register %eax on top of the stack
leal (%ebx,%ebx),%eax  // n + n and stores it as 2n in %eax
pushl %eax             // Pushing register %eax on top of the stack (Which I find strange
                       // considering that I've just pushed %eax onto the stack above
call silly             // Call the function silly
jmp .L4                // Jump to .L4 (Unconditionally)
.p2align 4,,7          // Don't know what this means.
.L3:                   // .L3 is the else-statement
xorl %eax,%eax         // Basically making %eax = 0
movl %eax,-4(%ebp)     // Moving the value in %eax which is 0 to the first local variable 
                       // meaning val = 0
.L4:                   // .L4 is the section after the else-statement
movl -4(%ebp),%edx     // Getting val again and now storing it in %edx
addl %eax,%edx         // Adding what is in %eax (which is 0) to %edx
movl 12(%ebp),%eax     // Getting the second parameter (*p) and storing it in %eax
addl %edx,%ebx         // Adding value from %edx to %ebx - meaning val + n
movl %ebx,(%eax)       // Moving what is in %ebx and storing it in memory location of %eax
movl -24(%ebp),%ebx    // Getting the second local variable (val2) and moving it to %ebx
movl %edx,%eax         // Move val to %eax - and the return value will be in %eax
movl %ebp,%esp         
popl %ebp
ret

I am trying to wrap my head around this and I've just started thinking about assembly so pointers on the subject would be really nice. I have a couple of questions I need to ask about this assembly code which could help my understanding of the stack:

  • (a) Is the variable val stored on the stack?
    (b) If so, at what byte oset(relative to %ebp) is it stored?
    (c) Why is it necessary to store it on the stack?

  • (a) Is the variable val2 stored on the stack?
    (b) If so, at what byte oset(relative to %ebp) is it stored?
    (c) Why is it necessary to store it on the stack?

  • (a) What (if anything) is stored at -24(%ebp)?
    (b) If something is stored there, why is it necessary to store it?

  • (a) What (if anything) is stored at -8(%ebp)?
    (b) If something is stored there, why is it necessary to store it?

Thanks in advance :)

like image 848
drleifz Avatar asked Oct 09 '14 21:10

drleifz


People also ask

Is assembly language hard to understand?

Unsurprisingly, Assembly is crowned the most difficult language to learn on a beginner level followed by Haskell.

What is assembly language and how it works?

Assembly language (or Assembler) is a compiled, low-level computer language. It is processor-dependent, since it basically translates the Assembler's mnemonics directly into the commands a particular CPU understands, on a one-to-one basis. These Assembler mnemonics are the instruction set for that processor.


1 Answers

Before answering your questions. Rather than commenting what the code is doing, I comment where all the values are in registers or on the stack.

Arguments are on the stack, return value is in %eax.

Registers %eax, %ecx, and %edx are caller saved. All other registers, including %ebx, %ebp, and %esp, are callee-saved (%edi and %esi are unused).

My notation for the stack is 4 bytes at a time and I use ; for where ebp points, if known.

silly:                     ; eax: ?, ebx: ebx0, edx: ?, stack: [eip0, n, p]
    pushl %ebp             ; eax: ?, ebx: ebx0, edx: ?, stack: [ebp0, eip0, n, p]
    movl %esp,%ebp         ; eax: ?, ebx: ebx0, edx: ?, stack: [; ebp0, eip0, n, p]
    subl $20,%esp          ; eax: ?, ebx: ebx0, edx: ?, stack: [?, ?, ?, ?, ?; ebp0, eip0, n, p]
    pushl %ebx             ; eax: ?, ebx: ebx0, edx: ?, stack: [ebx0, ?, ?, ?, ?, ?; ebp0, eip0, n, p]
    movl 8(%ebp),%ebx      ; eax: ?, ebx: n, edx: ?, stack: [ebx0, ?, ?, ?, ?, ?; ebp0, eip0, n, p]
    testl %ebx,%ebx        ; set flags from n
    jle .L3                ; if flags indicates <= 0, goto .L3, else fallthrough

                           ; set up for calling the function
    addl $-8,%esp          ; eax: ?, ebx: n, edx: ?, stack: [?, ?, ebx0, ?, ?, ?, ?, ?; ebp0, eip0, n, p]
    leal -4(%ebp),%eax     ; eax: &val, ebx: n, edx: ?, stack: [?, ?, ebx0, ?, ?, ?, ?, (stackeax); ebp0, eip0, n, p]
    pushl %eax             ; eax: &val, ebx: n, edx: ?, stack: [&val, ?, ?, ebx0, ?, ?, ?, ?, val=?; ebp0, eip0, n, p]
    leal (%ebx,%ebx),%eax  ; eax: 2*n, ebx: n, edx: ?, stack: [&val, ?, ?, ebx0, ?, ?, ?, ?, val=?; ebp0, eip0, n, p]
    pushl %eax             ; eax: 2*n, ebx: n, edx: ?, stack: [2*n, &val, ?, ?, ebx0, ?, ?, ?, ?, val=?; ebp0, eip0, n, p]
    call silly             ; pushes eip; args: (2*n, &val); val will be initialized on return
    jmp .L4                ;
                           ;
.p2align 4,,7              ; request alignment (there should be one before `silly:` too)
.L3:                       ;
    xorl %eax,%eax         ; eax: val=0, ebx: n, edx: ?, stack: [ebx0, ?, ?, ?, ?, ?; ebp0, eip0, n, p]
    movl %eax,-4(%ebp)     ; eax: val=0, ebx: n, edx: ?, stack: [ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
                           ;
.L4:                       ; eax: val2=φ(function result, 0), ebx: n, edx: ?, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    movl -4(%ebp),%edx     ; eax: val2, ebx: n, edx: val, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    addl %eax,%edx         ; eax: val2, ebx: n, edx: val+val2, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    movl 12(%ebp),%eax     ; eax: p, ebx: n, edx: val+val2, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    addl %edx,%ebx         ; eax: p, ebx: n+val+val2, edx: val+val2, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    movl %ebx,(%eax)       ; *p = n+val+val2
    movl -24(%ebp),%ebx    ; eax: p, ebx: ebx0, edx: val+val2, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    movl %edx,%eax         ; eax: val+val2, ebx: ebx0, edx: val+val2, stack: [..., ebx0, ?, ?, ?, ?, val; ebp0, eip0, n, p]
    movl %ebp,%esp         ; eax: val+val2, ebx: ebx0, edx: val+val2, stack: [; ebp0, eip0, n, p]
    popl %ebp              ; eax: val+val2, ebx: ebx0, edx: val+val2, stack: [eip0, n, p]
    ret                    ; eax: val+val2, ebx: ebx0, edx: val+val2, stack: [n, p]

STOP.

Go back and reread the code again. You're only hurting yourself if you don't figure out the answers on your own. It should be easy with the sort of comments I wrote.

But anyway ...


  1. a. val is usually on the stack, at -4(%ebp). The only time it isn't is on the line xorl %eax,%eax
    b. it is stored at -4(%ebp), as evidenced on the lines leal -4(%ebp),%eax, movl %eax,-4(%ebp), and movl -4(%ebp),%edx. Additionally, the previous frame's val is *p
    c. val must be on the stack so that its address can be taken and passed to the recursive call.
  2. a. val2 is never stored on the stack, though most likely some of those ?s are the space reserved for it.
    b. it is stored in regixter eax at .L4, which on the first branch of the phi function is the return value of the recursive call, and on the second branch is the value 0, which was also stored in val.
    c. val2 never needs to be on the stack since its address is not taken, it does not exist before the recursive call so it does not need to be saved, and there are few enough registers in use that there is no need to spill.
  3. a. -24(%ebp) is the saved value of %ebx, from the line pushl %ebx
    b. %ebx is a callee-saved register, so its value must be preserved.
  4. a. No, there is nothing there.
    b. It most likely would have been val2 if it were necessary to spill. My best guess is that the other three ?s are reserved for the unused-as-of-the-recursive-call caller-saved registers: %eax, %ecx, and %edx.
like image 103
o11c Avatar answered Nov 15 '22 05:11

o11c