Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is my conversion from assembly to c code correct?

Tags:

c

assembly

x86-64

I have a problem that asks to convert assembly code to C code.

I've made an attempt to convert the code and I think I got it mostly right but I am confused by the leaq instruction.

looper:
   movl  $o, %eax
   movl  $o, %edx
   jmp   .L2

.L4:
   movq  (%rsi, %rdx, 8), %rcx
   cmpq  %rcx, %rax
   jl    .L3
   movq  %rax, %rcx

.L3:
   leaq  1(%rcx), %rax
   addq  $1, %rdx

.L2:
   cmpq  %rdi, %rdx
   jl    .L4
   rep ret

Here is the C code that I got:

long looper(long n, long *a) {
  long i;
  long x = 0;

  for (i = 0; i < n; i++) {
    if (x < a[i]) {
      x = a[i] + 1;
    }
    x = a[i];
  }

  return x;
}
like image 885
Benjamin Rodriguez Avatar asked Aug 14 '19 21:08

Benjamin Rodriguez


People also ask

Can I convert assembly code to C?

You can't deterministically convert assembly code to C. Interrupts, self modifying code, and other low level things have no representation other than inline assembly in C. There is only some extent to which an assembly to C process can work.

How do you convert assembly to machine code?

So all you have to do is identify each opcode in the assembly language, map it to the corresponding machine instruction, and write the machine instruction out to a file, along with its corresponding parameters (if any). You then repeat the process for each additional opcode in the source file.

What converts code to assembly language?

Compiler: It converts high-level language into assembly code and then the assembler converts assembly code into machine code.

What does Movslq mean in assembly?

MOVSLQ is move and sign-extend a value from a 32-bit source to a 64-bit destination.


1 Answers

First of all, there seems to be an error in your assembly: looks like $o should be $0 (zero).

Your C code is almost correct, there are some errors:

  1. The order of the branches generated by the jl .L3 instruction is quite significant, in fact, if %rax < %rcx the branch is taken, and the instruction movq %rax, %rcx is ignored. On the other hand, if the branch is NOT taken, then that move is executed before stepping into .L3. So you basically swapped the two branches in your C code.

  2. The value of a[i] is not used directly every time, but it is saved in the %rcx register before being used. Both movq %rax, %rcx and movq (%rsi, %rdx, 8), %rcx assign to %rcx, then the value is passed from %rcx to %rax, so %rcx should be treated as a different variable. This means that writing x = a[i] + 1; is wrong. It should be:

    tmp = a[i];
    /* ... */
    x = tmp + 1; 
    

The resulting C code should be something like this:

int64_t looper(int64_t n, int64_t *arr) {
    int64_t result; // rax
    int64_t tmp;    // rcx
    int64_t i;      // rdx

    result = 0;

    for (i = 0; i < n; i++) {
        tmp = arr[i];

        if (result >= tmp)
            tmp = result;

        result = tmp + 1;
    }

    return result;
}

As an addition, compiling your binary using as -o prog prog.s and then disassembling with Radare2 gives this rather simple control flow graph:

    looper ();
        0x08000040      mov eax, 0
        0x08000045      mov edx, 0
    ,=< 0x0800004a      jmp 0x8000060
    |
    |   ; JMP XREF from 0x08000063 (looper)
  .---> 0x0800004c      mov rcx, qword [rsi + rdx*8]
  | |   0x08000050      cmp rax, rcx
,=====< 0x08000053      jl 0x8000058
| | |   0x08000055      mov rcx, rax
| | |
| | |   ; JMP XREF from 0x08000053 (looper)
`-----> 0x08000058      lea rax, qword [rcx + 1]
  | |   0x0800005c      add rdx, 1
  | |
  | |   ; JMP XREF from 0x0800004a (looper)
  | `-> 0x08000060      cmp rdx, rdi
  `===< 0x08000063      jl 0x800004c
        0x08000065      ret
like image 89
Marco Bonelli Avatar answered Oct 22 '22 16:10

Marco Bonelli