I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.
I thought Linux's perf
tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.
It's easy to record block I/O requests and completions with, eg:
sudo perf record -g -T -u postgres -e 'block:block_rq_*'
and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script
output like:
postgres 7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]
If I add the -g
flag to perf record
it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease
, LWLockAcquire
, memcpy
(mmap'd IO), __GI___libc_write
, etc.
So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.
I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.
OK, looks like there are several parts to this:
I'm on x86_64, where most distros build with -fomit-frame-pointer
by default, and perf
can't follow the stack without frame pointers;
.... unless it's a newer version built with libunwind
support, in which case it supports perf record -g dwarf
.
See:
I'm on Fedora 18, but the same issue applies. So if you're profiling code you're working on (as is likely on Stack Overflow), rebuild with -fno-omit-frame-pointer
and -ggdb
.
I landed up rebuilding perf
because I wanted to be able to compare to the stock RPMs:
sudo yum build-dep perf
sudo yum install yum-utils rpmdevtools libunwind-devel
yumdownloader --source perf
or download the appropriate kernel-.....src.rpm
srpmrpmdev-setuptree
rpm -Uvh kernel-*.src.rpm
cd $HOME/rpmbuild/SPECS
rpmbuild -bp --target=$(uname -m) kernel.spec
At this point you can just build a new perf
if you want:
cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
make
... which I did and tested that the updated perf
does in fact capture a useful stack if built with libunwind available.
You can also build a new rpm:
edit kernel.spec, uncomment the line %define buildid ...
, change buildid to something like .perfunwind
. Note it's %define
not % define
.
In the same spec file, find:
%global perf_make \
make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
and delete NO_LIBUNWIND=1
rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.spec
to produce new perf
RPMs without building the whole kernel. Or if you want, omit the --without
for the kernel flavour you want, in which case you'll also want to build headers, debuginfo, etc.
sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm
See the fedora project guide on building a custom kernel.
I've reported the issue to Fedora; they shouldn't be using NO_LIBUNWIND=1
. See bug 1025603.
Once you have a rebuilt perf
you can use perf record -g dwarf
to get full stacks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With