I'm profiling a C++ application using linux perf, and I'm getting a nice control flow graph using GProf2dot. However, some symbols from the C library (libc6-2.13.so) take a substantial portion of the total time, and yet have no in-edges.
For example:
_int_malloc
takes 8% of the time but has no call parents.__strcmp_sse42
and __cxxabiv1::__si_class_type_info::__do_dyncast
together take about 10% of the time, and have a caller whose name is 0
, which has callers 2d6935c
, 2cc748c
, and 6
, which have no callers.As a result, I can't find out which routines are responsible for all this mallocing and dynamic casting using just perf. However, it seems that other symbols (e.g. malloc
but not _int_malloc
) do have call parents.
Why doesn't perf show call parents for _int_malloc? Why can't I find the ultimate callers of __do_dyn_cast? And, is there some way for me to modify my setup so that I can get this information? I'm on x86-64, so I'm wondering if I need a (non-standard) libc6 with frame pointers.
Update: As of the 3.7.0 kernel, one can determine call parents of symbols in system libraries using perf record -gdwarf <command>
.
Using -gdwarf
, there is no need to compile with -fno-omit-frame-pointer
.
Original answer:
Yes, one probably would need a libc6 compiled with frame pointers (-fno-omit-framepointer
) on x86_64, at the moment (May 24, 2012).
However, developers are currently working on allowing the perf tools to use DWARF unwind info. This means that frame pointers are no longer needed to get backtrace information on x86_64. Linus, however, does not want a DWARF unwinder in the kernel. Thus, the perf tools will save registers as the system is running, and perform the DWARF unwinding in the userspace perf tool using the libunwind library.
This technique has been tested to successfully determine callers of (for example) malloc
and dynamic_cast
. However, the patch set is not yet integrated into the Linux kernel, and needs to undergo further revision before it is ready.
_int_malloc
and __do_dyn_cast
are being called from routines that the profiler can't identify because it doesn't have symbol table information for them.
What's more, it looks like you are showing self (exclusive) time. That is only useful for finding hotspots in routines that a) have much self time, and b) you can fix.
There's a reason profilers subsequent to the original unix profil
were created. Real software consists of functions that spend nearly all their time calling other functions, and you need to be able to find code that is on the stack much of the time, not that has the program counter much of the time.
So you need to configure perf
to take stack samples and tell you the percent of time each of your routines is on the stack.
It is even better if it reports not just routines, but lines of code, as in Zoom.
It is best to take the samples on wall-clock time, so you're not blind to IO.
There's more to say on all this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With