Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tracing/profiling instructions

I'd like to statistically profile my C code at the instruction level. I need to know how many additions, multiplications, divisions, etc I'm performing.

This is not your usual run of the mill code profiling requirement. I'm an algorithm developer and I want to estimate the cost of converting my code to hardware implementations. For this, I'm being asked the instruction call breakdown during run-time (parsing the compiled assembly isn't sufficient as it doesn't consider loops in the code).

After looking around, it seems VMware may offer a possible solution, but I still couldn't find the specific feature that will allow me to trace the instruction call stream of my process.

Are you aware of any profiling tools which enable this?

like image 764
LeChuck2k Avatar asked Jun 04 '10 06:06

LeChuck2k


1 Answers

QEMU user mode -d in_asm

This is another simple thing you can do to get an instruction trace:

sudo apt-get install qemu-user
qemu-x86_64 -d in_asm main.out

Let's test it with an x86_64 triple hello world:

main.S

.text
.global _start
_start:
asm_main_after_prologue:
    mov $3, %rbx
write:
    mov $1, %rax    /* syscall number */
    mov $1, %rdi    /* stdout */
    mov $msg, %rsi  /* buffer */
    mov $len, %rdx  /* len */
    syscall
    dec %rbx
    jne write
exit:
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
len = . - msg

adapted from GitHub upstream.

Assemble and run with:

as -o main.o main.S 
ld -o main.out main.o
./main.out

Stdout output:

hello
hello
hello

Running it through QEMU outputs the instruction trace to stderr:

warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
host mmap_min_addr=0x10000
Reserved 0x1000 bytes of guest address space
Relocating guest address space from 0x0000000000400000 to 0x400000
guest_base  0x0
start            end              size             prot
0000000000400000-0000000000401000 0000000000001000 r-x
0000004000000000-0000004000001000 0000000000001000 ---
0000004000001000-0000004000801000 0000000000800000 rw-
start_brk   0x0000000000000000
end_code    0x00000000004000b8
start_code  0x0000000000400000
start_data  0x00000000004000b8
end_data    0x00000000004000b8
start_stack 0x00000040007fed70
brk         0x00000000004000b8
entry       0x0000000000400078
----------------
IN: 
0x0000000000400078:  mov    $0x3,%rbx
0x000000000040007f:  mov    $0x1,%rax
0x0000000000400086:  mov    $0x1,%rdi
0x000000000040008d:  mov    $0x4000b2,%rsi
0x0000000000400094:  mov    $0x6,%rdx
0x000000000040009b:  syscall 

----------------
IN: 
0x000000000040009d:  dec    %rbx
0x00000000004000a0:  jne    0x40007f

----------------
IN: 
0x000000000040007f:  mov    $0x1,%rax
0x0000000000400086:  mov    $0x1,%rdi
0x000000000040008d:  mov    $0x4000b2,%rsi
0x0000000000400094:  mov    $0x6,%rdx
0x000000000040009b:  syscall 

----------------
IN: 
0x00000000004000a2:  mov    $0x3c,%rax
0x00000000004000a9:  mov    $0x0,%rdi
0x00000000004000b0:  syscall 

I expect this method to be relatively fast. It works by reading input instructions and producing output instructions that the host can run, much like cachegrind which was mentioned at: https://stackoverflow.com/a/2971979/895245

One cool thing about this is that you can also trivially trace executables of other architectures, see for example aarch64: How does native android code written for ARM run on x86?

This method also displays the current symbol of unstripped executables, e.g. the trace of:

main.c

#include <stdio.h>

int say_hello() {
    puts("hello");
}

int main(void) {
    say_hello();
}

compile and run:

gcc -ggdb3 -O0 -o main.out main.c
qemu-x86_64 -d in_asm ./main.out

contains:

----------------
IN: main
0x0000000000400537:  push   %rbp
0x0000000000400538:  mov    %rsp,%rbp
0x000000000040053b:  mov    $0x0,%eax
0x0000000000400540:  callq  0x400526

----------------
IN: say_hello
0x0000000000400526:  push   %rbp
0x0000000000400527:  mov    %rsp,%rbp
0x000000000040052a:  mov    $0x4005d4,%edi
0x000000000040052f:  callq  0x400400

----------------
IN: 
0x0000000000400400:  jmpq   *0x200c12(%rip)        # 0x601018

It does not however show symbols within shared libraries such as puts.

But you can see them if you compile with -static:

----------------
IN: main
0x00000000004009bf:  push   %rbp
0x00000000004009c0:  mov    %rsp,%rbp
0x00000000004009c3:  mov    $0x0,%eax
0x00000000004009c8:  callq  0x4009ae

----------------
IN: say_hello
0x00000000004009ae:  push   %rbp
0x00000000004009af:  mov    %rsp,%rbp
0x00000000004009b2:  mov    $0x4a1064,%edi
0x00000000004009b7:  callq  0x40faa0

----------------
IN: puts
0x000000000040faa0:  push   %r12
0x000000000040faa2:  push   %rbp
0x000000000040faa3:  mov    %rdi,%r12
0x000000000040faa6:  push   %rbx
0x000000000040faa7:  callq  0x423830

Related: https://unix.stackexchange.com/questions/147343/how-to-determine-what-instructions-a-process-is-executing

Tested in Ubuntu 16.04, QEMU 2.5.0.