Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analyzing assembly code

Tags:

c

assembly

 $ gcc -O2 -S test.c -----------------------(1)
      .file "test.c"
    .globl accum
       .bss
       .align 4
       .type accum, @object
       .size accum, 4
    accum:
       .zero 4
       .text
       .p2align 2,,3
    .globl sum
       .type sum, @function
    sum:
       pushl %ebp
       movl  %esp, %ebp
       movl  12(%ebp), %eax
       addl  8(%ebp), %eax
       addl  %eax, accum
       leave
       ret
       .size sum, .-sum
       .p2align 2,,3
    .globl main
       .type main, @function
    main:
       pushl %ebp
       movl  %esp, %ebp
       subl  $8, %esp
       andl  $-16, %esp
       subl  $16, %esp
       pushl $11
       pushl $10
       call  sum
       xorl  %eax, %eax
       leave
       ret
       .size main, .-main
       .section .note.GNU-stack,"",@progbits
       .ident   "GCC: (GNU) 3.4.6 20060404 (Red Hat 3.4.6-9)"

This is an assembly code generated from this C program:

#include <stdio.h>
int accum = 0;

int sum(int x,int y)
{
   int t = x+y;
   accum +=t;
   return t;
}

int main(int argc,char *argv[])
{
   int i = 0,x=10,y=11;
   i = sum(x,y);
   return 0;
}

Also, this is the object code generated from the above program:

$objdump -d test.o -------------------------(2) 

test.o:     file format elf32-i386

Disassembly of section .text:

00000000 <sum>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 45 0c                mov    0xc(%ebp),%eax
   6:   03 45 08                add    0x8(%ebp),%eax
   9:   01 05 00 00 00 00       add    %eax,0x0
   f:   c9                      leave
  10:   c3                      ret
  11:   8d 76 00                lea    0x0(%esi),%esi

00000014 <main>:
  14:   55                      push   %ebp
  15:   89 e5                   mov    %esp,%ebp
  17:   83 ec 08                sub    $0x8,%esp
  1a:   83 e4 f0                and    $0xfffffff0,%esp
  1d:   83 ec 10                sub    $0x10,%esp
  20:   6a 0b                   push   $0xb
  22:   6a 0a                   push   $0xa
  24:   e8 fc ff ff ff          call   25 <main+0x11>
  29:   31 c0                   xor    %eax,%eax
  2b:   c9                      leave
  2c:   c3                      ret

Ideally , listing (1) and (2) must be the same. But I see that there is movl, pushl etc in listing (1) whereas mov,push in lising (2). My question is:

  1. Which is the correct assembly instruction actually executed on the processor?
  2. In listing (1), I see this in the beginning:

.file "test.c"
    .globl accum
       .bss
       .align 4
       .type accum, @object
       .size accum, 4
    accum:
       .zero 4
       .text
       .p2align 2,,3
    .globl sum
       .type sum, @function 

and this at end:

.size main, .-main
           .section .note.GNU-stack,"",@progbits
           .ident   "GCC: (GNU) 3.4.6 20060404 (Red Hat 3.4.6-9)"

What does this mean?

Thanks.

like image 546
Onkar Mahajan Avatar asked Oct 26 '10 04:10

Onkar Mahajan


People also ask

How do you explain an assembly code?

Assembly language is low-level code that relies on a strong relationship between the instructions input using the coding language and how a machine interprets the code instructions. Code is converted into executable actions using an assembler that converts input into recognizable instructions for the machine.

How do computers read assembly code?

There are many ways computers read code, but a popular technique is called the stack method. The stack method starts with what is called an interpreter. This type of intermediary program understands the code that is being used and translates it into machine language.

What are the 4 parts of an assembly language statement?

Each source statement may include up to four fields: a label, an operation (instruction mnemonic or assembler directive), an operand, and a comment. The following are examples of an assembly directive and a regular machine instruction.


1 Answers

The instruction is called MOV whatever variant is being used. The l suffix is just a gcc / AT&T assembly convention to specify the size of operands desired, in this case 4 byte operands.

In Intel syntax - where there is any ambiguity - instead of suffixing the instruction it is usual to tag the memory parameter with an indicator of the size required (e.g. BYTE, WORD, DWORD, etc.), it's just another way of achieving the same thing.

89 55 is the correct sequence of bytes for MOV from the 32-bit register EBP to the 32-bit register ESP. There is nothing wrong in either listing.


Specifies the file that this assembly code was generated from:

.file "test.c"

Says that accum is a global symbol (C variable with external linkage):

    .globl accum

The following bytes should be placed in a bss section, this is a section that takes no space in the object file but is allocated and zeroed at runtime.

       .bss

Aligned on a 4 byte boundary:

       .align 4

It's an object (a variable, not some code):

       .type accum, @object

It's four bytes:

       .size accum, 4

Here is where accum is defined, four zero bytes.

    accum:
       .zero 4

Now switch from the bss section to the text section which is where functions are usually stored.

       .text

Add up to three bytes of padding to make sure we are on a 4 byte (2^2) boundary:

       .p2align 2,,3

sum is a global symbol and it's a function.

    .globl sum
       .type sum, @function 

The size of main is "here" - "where main started":

.size main, .-main

These where gcc specific stack options are specified. Usually, this is where you choose to have an executable stack (not very safe) or not (usually preferred).

       .section .note.GNU-stack,"",@progbits

Identify which version of the compiler generated this assembly:

       .ident   "GCC: (GNU) 3.4.6 20060404 (Red Hat 3.4.6-9)"
like image 192
CB Bailey Avatar answered Oct 20 '22 08:10

CB Bailey