I'd like to statistically profile my C code at the instruction level. I need to know how many additions, multiplications, divisions, etc I'm performing.
This is not your usual run of the mill code profiling requirement. I'm an algorithm developer and I want to estimate the cost of converting my code to hardware implementations. For this, I'm being asked the instruction call breakdown during run-time (parsing the compiled assembly isn't sufficient as it doesn't consider loops in the code).
After looking around, it seems VMware may offer a possible solution, but I still couldn't find the specific feature that will allow me to trace the instruction call stream of my process.
Are you aware of any profiling tools which enable this?
QEMU user mode -d in_asm
This is another simple thing you can do to get an instruction trace:
sudo apt-get install qemu-user
qemu-x86_64 -d in_asm main.out
Let's test it with an x86_64 triple hello world:
main.S
.text
.global _start
_start:
asm_main_after_prologue:
mov $3, %rbx
write:
mov $1, %rax /* syscall number */
mov $1, %rdi /* stdout */
mov $msg, %rsi /* buffer */
mov $len, %rdx /* len */
syscall
dec %rbx
jne write
exit:
mov $60, %rax /* syscall number */
mov $0, %rdi /* exit status */
syscall
msg:
.ascii "hello\n"
len = . - msg
adapted from GitHub upstream.
Assemble and run with:
as -o main.o main.S
ld -o main.out main.o
./main.out
Stdout output:
hello
hello
hello
Running it through QEMU outputs the instruction trace to stderr:
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
host mmap_min_addr=0x10000
Reserved 0x1000 bytes of guest address space
Relocating guest address space from 0x0000000000400000 to 0x400000
guest_base 0x0
start end size prot
0000000000400000-0000000000401000 0000000000001000 r-x
0000004000000000-0000004000001000 0000000000001000 ---
0000004000001000-0000004000801000 0000000000800000 rw-
start_brk 0x0000000000000000
end_code 0x00000000004000b8
start_code 0x0000000000400000
start_data 0x00000000004000b8
end_data 0x00000000004000b8
start_stack 0x00000040007fed70
brk 0x00000000004000b8
entry 0x0000000000400078
----------------
IN:
0x0000000000400078: mov $0x3,%rbx
0x000000000040007f: mov $0x1,%rax
0x0000000000400086: mov $0x1,%rdi
0x000000000040008d: mov $0x4000b2,%rsi
0x0000000000400094: mov $0x6,%rdx
0x000000000040009b: syscall
----------------
IN:
0x000000000040009d: dec %rbx
0x00000000004000a0: jne 0x40007f
----------------
IN:
0x000000000040007f: mov $0x1,%rax
0x0000000000400086: mov $0x1,%rdi
0x000000000040008d: mov $0x4000b2,%rsi
0x0000000000400094: mov $0x6,%rdx
0x000000000040009b: syscall
----------------
IN:
0x00000000004000a2: mov $0x3c,%rax
0x00000000004000a9: mov $0x0,%rdi
0x00000000004000b0: syscall
I expect this method to be relatively fast. It works by reading input instructions and producing output instructions that the host can run, much like cachegrind which was mentioned at: https://stackoverflow.com/a/2971979/895245
One cool thing about this is that you can also trivially trace executables of other architectures, see for example aarch64: How does native android code written for ARM run on x86?
This method also displays the current symbol of unstripped executables, e.g. the trace of:
main.c
#include <stdio.h>
int say_hello() {
puts("hello");
}
int main(void) {
say_hello();
}
compile and run:
gcc -ggdb3 -O0 -o main.out main.c
qemu-x86_64 -d in_asm ./main.out
contains:
----------------
IN: main
0x0000000000400537: push %rbp
0x0000000000400538: mov %rsp,%rbp
0x000000000040053b: mov $0x0,%eax
0x0000000000400540: callq 0x400526
----------------
IN: say_hello
0x0000000000400526: push %rbp
0x0000000000400527: mov %rsp,%rbp
0x000000000040052a: mov $0x4005d4,%edi
0x000000000040052f: callq 0x400400
----------------
IN:
0x0000000000400400: jmpq *0x200c12(%rip) # 0x601018
It does not however show symbols within shared libraries such as puts.
But you can see them if you compile with -static
:
----------------
IN: main
0x00000000004009bf: push %rbp
0x00000000004009c0: mov %rsp,%rbp
0x00000000004009c3: mov $0x0,%eax
0x00000000004009c8: callq 0x4009ae
----------------
IN: say_hello
0x00000000004009ae: push %rbp
0x00000000004009af: mov %rsp,%rbp
0x00000000004009b2: mov $0x4a1064,%edi
0x00000000004009b7: callq 0x40faa0
----------------
IN: puts
0x000000000040faa0: push %r12
0x000000000040faa2: push %rbp
0x000000000040faa3: mov %rdi,%r12
0x000000000040faa6: push %rbx
0x000000000040faa7: callq 0x423830
Related: https://unix.stackexchange.com/questions/147343/how-to-determine-what-instructions-a-process-is-executing
Tested in Ubuntu 16.04, QEMU 2.5.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With