Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting user-space stack information from perf

I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.

I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.

It's easy to record block I/O requests and completions with, eg:

sudo perf record -g -T -u postgres -e 'block:block_rq_*'

and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script output like:

postgres  7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]

If I add the -g flag to perf record it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease, LWLockAcquire, memcpy (mmap'd IO), __GI___libc_write, etc.

So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.

I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.

like image 934
Craig Ringer Avatar asked Nov 01 '13 02:11

Craig Ringer


1 Answers

OK, looks like there are several parts to this:

  • I'm on x86_64, where most distros build with -fomit-frame-pointer by default, and perf can't follow the stack without frame pointers;

  • .... unless it's a newer version built with libunwind support, in which case it supports perf record -g dwarf.

See:

  • the patch adding libunwind support to Perf
  • Debian bug 725075.
  • linux perf: how to interpret and find hotspots

I'm on Fedora 18, but the same issue applies. So if you're profiling code you're working on (as is likely on Stack Overflow), rebuild with -fno-omit-frame-pointer and -ggdb.

I landed up rebuilding perf because I wanted to be able to compare to the stock RPMs:

  • sudo yum build-dep perf
  • sudo yum install yum-utils rpmdevtools libunwind-devel
  • yumdownloader --source perf or download the appropriate kernel-.....src.rpm srpm
  • rpmdev-setuptree
  • rpm -Uvh kernel-*.src.rpm
  • cd $HOME/rpmbuild/SPECS
  • rpmbuild -bp --target=$(uname -m) kernel.spec

At this point you can just build a new perf if you want:

  • cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
  • make

... which I did and tested that the updated perf does in fact capture a useful stack if built with libunwind available.

You can also build a new rpm:

  • edit kernel.spec, uncomment the line %define buildid ..., change buildid to something like .perfunwind. Note it's %define not % define.

  • In the same spec file, find:

    %global perf_make \
    make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
    

    and delete NO_LIBUNWIND=1

  • rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.spec to produce new perf RPMs without building the whole kernel. Or if you want, omit the --without for the kernel flavour you want, in which case you'll also want to build headers, debuginfo, etc.

  • sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm

See the fedora project guide on building a custom kernel.

I've reported the issue to Fedora; they shouldn't be using NO_LIBUNWIND=1. See bug 1025603.

Once you have a rebuilt perf you can use perf record -g dwarf to get full stacks.

like image 171
Craig Ringer Avatar answered Oct 07 '22 16:10

Craig Ringer