I have a problem that asks to convert assembly code to C code.
I've made an attempt to convert the code and I think I got it mostly right but I am confused by the leaq
instruction.
looper:
movl $o, %eax
movl $o, %edx
jmp .L2
.L4:
movq (%rsi, %rdx, 8), %rcx
cmpq %rcx, %rax
jl .L3
movq %rax, %rcx
.L3:
leaq 1(%rcx), %rax
addq $1, %rdx
.L2:
cmpq %rdi, %rdx
jl .L4
rep ret
Here is the C code that I got:
long looper(long n, long *a) {
long i;
long x = 0;
for (i = 0; i < n; i++) {
if (x < a[i]) {
x = a[i] + 1;
}
x = a[i];
}
return x;
}
You can't deterministically convert assembly code to C. Interrupts, self modifying code, and other low level things have no representation other than inline assembly in C. There is only some extent to which an assembly to C process can work.
So all you have to do is identify each opcode in the assembly language, map it to the corresponding machine instruction, and write the machine instruction out to a file, along with its corresponding parameters (if any). You then repeat the process for each additional opcode in the source file.
Compiler: It converts high-level language into assembly code and then the assembler converts assembly code into machine code.
MOVSLQ is move and sign-extend a value from a 32-bit source to a 64-bit destination.
First of all, there seems to be an error in your assembly: looks like $o
should be $0
(zero).
Your C code is almost correct, there are some errors:
The order of the branches generated by the jl .L3
instruction is quite significant, in fact, if %rax < %rcx
the branch is taken, and the instruction movq %rax, %rcx
is ignored. On the other hand, if the branch is NOT taken, then that move is executed before stepping into .L3
. So you basically swapped the two branches in your C code.
The value of a[i]
is not used directly every time, but it is saved in the %rcx
register before being used. Both movq %rax, %rcx
and movq (%rsi, %rdx, 8), %rcx
assign to %rcx
, then the value is passed from %rcx
to %rax
, so %rcx
should be treated as a different variable. This means that writing x = a[i] + 1;
is wrong. It should be:
tmp = a[i];
/* ... */
x = tmp + 1;
The resulting C code should be something like this:
int64_t looper(int64_t n, int64_t *arr) {
int64_t result; // rax
int64_t tmp; // rcx
int64_t i; // rdx
result = 0;
for (i = 0; i < n; i++) {
tmp = arr[i];
if (result >= tmp)
tmp = result;
result = tmp + 1;
}
return result;
}
As an addition, compiling your binary using as -o prog prog.s
and then disassembling with Radare2 gives this rather simple control flow graph:
looper ();
0x08000040 mov eax, 0
0x08000045 mov edx, 0
,=< 0x0800004a jmp 0x8000060
|
| ; JMP XREF from 0x08000063 (looper)
.---> 0x0800004c mov rcx, qword [rsi + rdx*8]
| | 0x08000050 cmp rax, rcx
,=====< 0x08000053 jl 0x8000058
| | | 0x08000055 mov rcx, rax
| | |
| | | ; JMP XREF from 0x08000053 (looper)
`-----> 0x08000058 lea rax, qword [rcx + 1]
| | 0x0800005c add rdx, 1
| |
| | ; JMP XREF from 0x0800004a (looper)
| `-> 0x08000060 cmp rdx, rdi
`===< 0x08000063 jl 0x800004c
0x08000065 ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With