So let's say I C/C++ code that allocates some memory, and returns a pointer to it.
#include <stdlib.h>
#ifdef __cplusplus
extern "C" {
#endif
void Allocate(void **p) {
int N=2048;
*p=malloc(N);
}
#ifdef __cplusplus
}
#endif
I'm expecting that it's my responsibility to free that block of memory, obviously. Now suppose I compile this into a shared library and call it from Python with ctypes, but don't explicitly free that memory.
import ctypes
from ctypes import cdll, Structure, byref
external_lib = cdll.LoadLibrary('libtest.so.1.0')
ptr=ctypes.c_void_p(0)
external_lib.Allocate(ctypes.byref(ptr))
If I run this script with valgrind, I get a memory leak of 2048 bytes if I compile test.cpp without the '-O3' flag. But if I compile it with the '-O3' flag, then I do not get the memory leak.
It's not really a problem - I'll always be careful to explicitly free any memory I allocate. But I'm curious where this behavior comes from.
I tested this with the following script in linux.
g++ -Wall -c -fPIC -fno-common test.cpp -o libtest1.o
g++ -shared -Wl,-soname,libtest1.so.1 -o libtest1.so.1.0 libtest1.o
g++ -O3 -Wall -c -fPIC -fno-common test.cpp -o libtest2.o
g++ -shared -Wl,-soname,libtest2.so.1 -o libtest2.so.1.0 libtest2.o
valgrind python test1.py &> report1
valgrind python test2.py &> report2
with the following output
report1:
==27875== LEAK SUMMARY:
==27875== definitely lost: 2,048 bytes in 1 blocks
==27875== indirectly lost: 0 bytes in 0 blocks
==27875== possibly lost: 295,735 bytes in 1,194 blocks
==27875== still reachable: 744,633 bytes in 5,025 blocks
==27875== suppressed: 0 bytes in 0 blocks
report2:
==27878== LEAK SUMMARY:
==27878== definitely lost: 0 bytes in 0 blocks
==27878== indirectly lost: 0 bytes in 0 blocks
==27878== possibly lost: 295,735 bytes in 1,194 blocks
==27878== still reachable: 746,681 bytes in 5,026 blocks
==27878== suppressed: 0 bytes in 0 blocks
Different users appear to obtain different results depending on their platform. I've tried to reproduce this issue unsuccessfully on a Debian Wheezy system with Python 2.5.5, Python 2.6.8, Python 3.2.3 with g++ 4.7.2.
Based on your code you know that it's leaky, it's just that valgrind is reporting the memory usage differently. In report 1, there's definitely no reference to the 2048 chunk. In report 2, it's listed in the still reachable
section.
The valgrind leak detector documentation describes how leaks are detected. It's interesting to note that it looks for references in both memory and the general purpose register set for each thread. It would be conceivable (but I would have thought unlikely) that when the leak detector runs on program exit, there is still a reference in one of the CPU registers to the memory which has been allocated. For the unoptimised version, additional instructions may exist in the Allocate
function which clobber any register information which may contain the leaked reference. On the optimised version, it's possible for the Allocate
function to retain a reference in a register as well as storing the result in *p
.
Of course, without being able to reproduce this, it's all a guess. You can request valgrind
to output more information about the references it finds which may provide more insight about the allocated blocks.
eg. This will show both reachable and unreachable blocks.
valgrind --show-reachable=yes --leak-check=full python2.5 test1.py &> report1-2.5
If I modify your code to be the following, all tests on my system indicate that a 2048 block is definitely lost (even though 4096 bytes have been allocated). This also leads me to believe that it could be some kind of cached register value which is being picked up by valgrind's leak detector.
import ctypes
from ctypes import cdll, Structure, byref
external_lib = cdll.LoadLibrary('libtest.so.1.0')
ptr=ctypes.c_void_p(0)
external_lib.Allocate(ctypes.byref(ptr))
external_lib.Allocate(ctypes.byref(ptr)) # <-- Allocate a second block, the first becomes lost.
Here's the resulting snippet from valgrind showing both a reachable and unreachable block:
==28844== 2,048 bytes in 1 blocks are still reachable in loss record 305 of 366
==28844== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28844== by 0x6CD870F: Allocate (in /projects/stack-overflow/18929183-python-garbage-collector-behavior-with-ctypes/libtest1.so.1.0)
==28844== by 0x6ACEDEF: ffi_call_unix64 (in /usr/lib/python2.6/lib-dynload/_ctypes.so)
==28844== by 0x6ACE86A: ffi_call (in /usr/lib/python2.6/lib-dynload/_ctypes.so)
==28844== by 0x6AC9A66: _CallProc (callproc.c:816)
==28844== by 0x6AC136C: CFuncPtr_call (_ctypes.c:3860)
==28844== by 0x424989: PyObject_Call (abstract.c:2492)
==28844== by 0x4A17B8: PyEval_EvalFrameEx (ceval.c:3968)
==28844== by 0x49F0D1: PyEval_EvalCodeEx (ceval.c:3000)
==28844== by 0x49F211: PyEval_EvalCode (ceval.c:541)
==28844== by 0x4C66FE: PyRun_FileExFlags (pythonrun.c:1358)
==28844== by 0x4C7A36: PyRun_SimpleFileExFlags (pythonrun.c:948)
==28844==
==28844== 2,048 bytes in 1 blocks are definitely lost in loss record 306 of 366
==28844== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28844== by 0x6CD870F: Allocate (in /projects/stack-overflow/18929183-python-garbage-collector-behavior-with-ctypes/libtest1.so.1.0)
==28844== by 0x6ACEDEF: ffi_call_unix64 (in /usr/lib/python2.6/lib-dynload/_ctypes.so)
==28844== by 0x6ACE86A: ffi_call (in /usr/lib/python2.6/lib-dynload/_ctypes.so)
==28844== by 0x6AC9A66: _CallProc (callproc.c:816)
==28844== by 0x6AC136C: CFuncPtr_call (_ctypes.c:3860)
==28844== by 0x424989: PyObject_Call (abstract.c:2492)
==28844== by 0x4A17B8: PyEval_EvalFrameEx (ceval.c:3968)
==28844== by 0x49F0D1: PyEval_EvalCodeEx (ceval.c:3000)
==28844== by 0x49F211: PyEval_EvalCode (ceval.c:541)
==28844== by 0x4C66FE: PyRun_FileExFlags (pythonrun.c:1358)
==28844== by 0x4C7A36: PyRun_SimpleFileExFlags (pythonrun.c:948)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With