I'm using valgrind (v3.10.0) to hunt down a memory leak in a complex application (a heavily modified build of net-snmp) that is being built as part of a bigger software suite. I am sure there is a leak (the memory footprint of the application grows linearly without bound), but valgrind always reports the following upon termination.
==1139== HEAP SUMMARY:
==1139== in use at exit: 0 bytes in 0 blocks
==1139== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1139==
==1139== All heap blocks were freed -- no leaks are possible
The total heap usage cannot be zero -- there are many, many calls to malloc
and free
throughout the application. Valgrind is still capable of finding "Invalid Write" errors.
The application in question is being compiled, along with other software packages, with a uclibc-gcc toolchain for the MIPS processor (uclibc v0.9.29) to be flashed onto an embedded device running a busybox (v1.17.2) linux shell. I am running valgrind directly on the device. I use the following options when launching Valgrind:
--tool=memcheck --leak-check=full --undef-value-errors=no --trace-children=yes
Basically, Valgrind doesn't detect any heap usage even though I've used the heap. Why might this be? Are any of my assumptions (below) wrong?
I compiled the simple test program (using the same target and toolchain as the application above) from the Valgrind quick-start tutorial, to see if Valgrind would detect the leak. The final output was the same as above: no heap usage.
Valgrind documentation has the following to say on their FAQ:
If your program is statically linked, most Valgrind tools will only work well if they are able to replace certain functions, such as malloc, with their own versions. By default, statically linked malloc functions are not replaced. A key indicator of this is if Memcheck says "All heap blocks were freed -- no leaks are possible".
The above sounds exactly like my problem, so I checked to see that it's dynamically linked to the C libraries that contained malloc
and free
. I used the uclibc toolchain's custom ldd
executable (I can't use the native linux ldd
) and the output included the following lines:
libc.so.0 => not found (0x00000000)
/lib/ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0x00000000)
(The reason they're not found is because I'm running this on the x86 host device; the mips target device doesn't have an ldd executable.) Based on my understanding, malloc
and free
will be in one of these libraries, and they seem to be dynamically linked. I also did readelf
and nm
on the executable to confirm that the references to malloc
and free
are undefined (which is characteristic of a dynamically linked executable).
Additionally, I tried launching Valgrind with the --soname-synonyms=somalloc=NONE
option as suggested by the FAQ.
As pointed out by commenters and answerers, Valgrind depends upon usage of LD_PRELOAD. It has been suggested that my toolchain doesn't support this feature. In order to confirm that it does, I followed this example to create a simple test library and load it (I replaced rand()
with a function that just returns 42). The test worked, so it would seem that my target supports LD_PRELOAD just fine.
I'll also include some information from the readelf
command which may be useful. Rather than a giant dump, I've trimmed things down to include only what may be relevant.
Dynamic section
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libnetsnmpagent.so.30]
0x00000001 (NEEDED) Shared library: [libnetsnmpmibs.so.30]
0x00000001 (NEEDED) Shared library: [libnetsnmp.so.30]
0x00000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x00000001 (NEEDED) Shared library: [libc.so.0]
0x0000000f (RPATH) Library rpath: [//lib]
Symbol table '.dynsym'
Num: Value Size Type Bind Vis Ndx Name
27: 00404a40 0 FUNC GLOBAL DEFAULT UND free
97: 00404690 0 FUNC GLOBAL DEFAULT UND malloc
First, let's do a real test to see whether something is statically linked.
$ ldd -v /bin/true
linux-vdso.so.1 => (0x00007fffdc502000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0731e11000)
/lib64/ld-linux-x86-64.so.2 (0x00007f07321ec000)
Version information:
/bin/true:
libc.so.6 (GLIBC_2.3) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libc.so.6:
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
The second line in the output shows it is dynamically linked to libc
, which is what contains malloc
.
As for what might be going wrong, I can suggest four things:
Perhaps it's not linked to normal libc
, but to some other C library (e.g. uclibc
) or something else valgrind
is not expecting. The above test will show you exactly what it's linked to. In order for valgrind
to work, it uses LD_PRELOAD
to wrap the malloc()
and free()
functions (description of general function wrapping here). If your libc
substitute doesn't support LD_PRELOAD
or (somehow) the C library's malloc()
and free()
aren't being used at all (with those names), then valgrind
is not going to work. Perhaps you could include the link line used when you build your application.
It is leaking, but it's not allocating memory using malloc()
. For instance, it might (unlikely) be doing its own calls to brk()
, or (more likely) allocating memory with mmap
. You can use this to find out (this was a dump of cat
itself).
.
$ cat /proc/PIDNUMBERHERE/maps
00400000-0040b000 r-xp 00000000 08:01 805303 /bin/cat
0060a000-0060b000 r--p 0000a000 08:01 805303 /bin/cat
0060b000-0060c000 rw-p 0000b000 08:01 805303 /bin/cat
02039000-0205a000 rw-p 00000000 00:00 0 [heap]
7fbc8f418000-7fbc8f6e4000 r--p 00000000 08:01 1179774 /usr/lib/locale/locale-archive
7fbc8f6e4000-7fbc8f899000 r-xp 00000000 08:01 1573024 /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8f899000-7fbc8fa98000 ---p 001b5000 08:01 1573024 /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa98000-7fbc8fa9c000 r--p 001b4000 08:01 1573024 /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa9c000-7fbc8fa9e000 rw-p 001b8000 08:01 1573024 /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa9e000-7fbc8faa3000 rw-p 00000000 00:00 0
7fbc8faa3000-7fbc8fac5000 r-xp 00000000 08:01 1594541 /lib/x86_64-linux-gnu/ld-2.15.so
7fbc8fca6000-7fbc8fca9000 rw-p 00000000 00:00 0
7fbc8fcc3000-7fbc8fcc5000 rw-p 00000000 00:00 0
7fbc8fcc5000-7fbc8fcc6000 r--p 00022000 08:01 1594541 /lib/x86_64-linux-gnu/ld-2.15.so
7fbc8fcc6000-7fbc8fcc8000 rw-p 00023000 08:01 1594541 /lib/x86_64-linux-gnu/ld-2.15.so
7fffe1674000-7fffe1695000 rw-p 00000000 00:00 0 [stack]
7fffe178d000-7fffe178f000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Note whether the end address of [heap]
is actually growing, or whether you are seeing additional mmap
entries. Another good indicator of whether valgrind
is working is to send a SIGSEGV
or similar to the process and see whether you see heap in use on exit.
It isn't leaking in the strict sense, but it is leaking to all intents and purposes. For instance, perhaps it has datastructure (like a cache), which grows over time. On exit, the program (correctly) frees all entries in the cache. So, on exit, nothing is in use on the heap. In this instance, you'll want to know what is growing. This is a harder proposition. I'd use the technique to kill the program (above), capture the output, and post-process it. If you see 500 things after 24 hours, 1,000 after 48 hours, and 1,500 after 72 hours, that should give you an indication of what is 'leaking'. However, as haris points out in the comments, whilst this would result in the memory not being shown as leaks, it doesn't explain the 'total heap usage' being zero, as this describes the total allocations made and freed.
Perhaps valgrind
is just not working on your platform. What happens if you build a very simple program like the one below, and run valgrind
on it on your platform? If this isn't working, you need to find out why valgrind
is not operating right. Note that valgrind
on MIPS is pretty new. Here is an email thread where a developer with MIPS and uclibc discovers valgrind
is not reporting any allocations. His solution is to replace ntpl
with linuxthreads
.
.
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv)
{
void *p = malloc (100); /* does not leak */
void *q = malloc (100); /* leaks */
free (p);
exit (0);
}
In order to confirm that the executable is not statically linked, I ran file snmpd
Your problem is most likely not that the binary is statically linked (you now know it is not), but that malloc
and free
are statically linked into it (perhaps you are using alternative malloc implementation, such as tcmalloc
?).
When you built the simple test case (on which Valgrind worked correctly), you likely didn't use the same link command line (and the same libraries) as your real application does.
In any case, it is trivial to check:
readelf -Ws snmpd | grep ' malloc'
If this shows UND
(i.e. undefined), the Valgrind should have no trouble intercepting it. But chances are it shows FUNC GLOBAL DEFAULT ... malloc
instead, which means that your snmpd
is as good as statically linked as far as valgrind is concerned.
Assuming my guess is correct, relink snmpd
with -Wl,-y,malloc
flag. That will tell you which library defines your malloc
. Remove it from the link, find and fix the leak, then decide whether having that library is worth the trouble it has caused you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With