Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get call parents for libc6 symbols (e.g. _int_malloc) with linux perf?

I'm profiling a C++ application using linux perf, and I'm getting a nice control flow graph using GProf2dot. However, some symbols from the C library (libc6-2.13.so) take a substantial portion of the total time, and yet have no in-edges.

For example:

  • _int_malloc takes 8% of the time but has no call parents.
  • __strcmp_sse42 and __cxxabiv1::__si_class_type_info::__do_dyncast together take about 10% of the time, and have a caller whose name is 0, which has callers 2d6935c, 2cc748c, and 6, which have no callers.

As a result, I can't find out which routines are responsible for all this mallocing and dynamic casting using just perf. However, it seems that other symbols (e.g. malloc but not _int_malloc) do have call parents.

Why doesn't perf show call parents for _int_malloc? Why can't I find the ultimate callers of __do_dyn_cast? And, is there some way for me to modify my setup so that I can get this information? I'm on x86-64, so I'm wondering if I need a (non-standard) libc6 with frame pointers.

like image 319
BenRI Avatar asked Apr 18 '12 16:04

BenRI


2 Answers

Update: As of the 3.7.0 kernel, one can determine call parents of symbols in system libraries using perf record -gdwarf <command>.

Using -gdwarf, there is no need to compile with -fno-omit-frame-pointer.

Original answer: Yes, one probably would need a libc6 compiled with frame pointers (-fno-omit-framepointer) on x86_64, at the moment (May 24, 2012).

However, developers are currently working on allowing the perf tools to use DWARF unwind info. This means that frame pointers are no longer needed to get backtrace information on x86_64. Linus, however, does not want a DWARF unwinder in the kernel. Thus, the perf tools will save registers as the system is running, and perform the DWARF unwinding in the userspace perf tool using the libunwind library.

This technique has been tested to successfully determine callers of (for example) malloc and dynamic_cast. However, the patch set is not yet integrated into the Linux kernel, and needs to undergo further revision before it is ready.

like image 169
BenRI Avatar answered Nov 15 '22 00:11

BenRI


_int_malloc and __do_dyn_cast are being called from routines that the profiler can't identify because it doesn't have symbol table information for them.

What's more, it looks like you are showing self (exclusive) time. That is only useful for finding hotspots in routines that a) have much self time, and b) you can fix.

There's a reason profilers subsequent to the original unix profil were created. Real software consists of functions that spend nearly all their time calling other functions, and you need to be able to find code that is on the stack much of the time, not that has the program counter much of the time.

So you need to configure perf to take stack samples and tell you the percent of time each of your routines is on the stack. It is even better if it reports not just routines, but lines of code, as in Zoom. It is best to take the samples on wall-clock time, so you're not blind to IO.

There's more to say on all this.

like image 32
Mike Dunlavey Avatar answered Nov 14 '22 23:11

Mike Dunlavey