Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging a Python/NumPy memory leak

I am trying to find the origin of a nasty memory leak in a Python/NumPy program using C/Cython extensions and multiprocessing.

Each subprocess processes a list of images, and for each of them sends the output array (which usually is about 200-300MB large) through a Queue to the main process. Fairly standard map/reduce setup.

As you can imagine a memory leak can take ginormous proportions with arrays this large, and having multiple processes happily go over 20GB RAM when they would only need 5-6GB is... annoying.

  • I have tried running a debug build of Python through Valgrind, and quadruple-checked my extensions for memory leaks, but found nothing.

  • I have checked my Python code for dangling references to my arrays, and also used NumPy's allocation tracker to check whether my arrays were indeed released. They were.

The last thing I did was attaching GDB onto one of my processes (this bad boy is right now running at 27GB RAM and counting) and dumping a large part of the heap to the disk. Much to my surprise, the dumped file was full of zeroes! About 7G worth of zeroes.

Is this a standard memory allocation behavior in Python/NumPy? Did I miss something obvious that would explain having so much memory used for nothing? How can I manage the memory properly?


EDIT: For the record, I'm running NumPy 1.7.1 and Python 2.7.3.

EDIT 2: I've been monitoring the process with strace, and it seems that it keeps on increasing the break point of each process (using the brk() syscall).

Is CPython actually releasing memory properly? What about C extensions, NumPy arrays? Who decides when to call brk(), is it Python itself or is it an underlying library (libc,...)?

Below is a sample strace log with comments, from one iteration (i.e. one input image set). Note that the break point keeps on increasing, but I made sure (with objgraph) that no meaningful NumPy arrays are kept inside the Python interpreter.

# Reading .inf files with metadata
# Pretty small, no brk()
open("1_tif_all/AIR00642_1.inf", O_RDONLY) = 6
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000
munmap(0x7f9387fff000, 4096)            = 0
open("1_tif_all/AIR00642_2.inf", O_RDONLY) = 6
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000
munmap(0x7f9387fff000, 4096)            = 0
open("1_tif_all/AIR00642_3.inf", O_RDONLY) = 6
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000
munmap(0x7f9387fff000, 4096)            = 0
open("1_tif_all/AIR00642_4.inf", O_RDONLY) = 6
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9387fff000
munmap(0x7f9387fff000, 4096)            = 0

# This is where I'm starting the heavy processing
write(2, "[INFO/MapProcess-1] Shot 642: Da"..., 68) = 68
write(2, "[INFO/MapProcess-1] Shot 642: Vi"..., 103) = 103
write(2, "[INFO/MapProcess-1] Shot 642: Re"..., 66) = 66

# I'm opening a .tif image (752 x 480, 8-bit, 1 channel)
open("1_tif_all/AIR00642_3.tif", O_RDONLY) = 6
read(6, "II*\0JC\4\0", 8)               = 8
mmap(NULL, 279600, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fbb000
munmap(0x7f9387fbb000, 279600)          = 0
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 53) = 53

# Another .tif
open("1_tif_all/AIR00642_4.tif", O_RDONLY) = 6
read(6, "II*\0\266\374\3\0", 8)         = 8
mmap(NULL, 261532, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fc0000
munmap(0x7f9387fc0000, 261532)          = 0
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 51) = 51
brk(0x1aea97000)                        = 0x1aea97000

# Another .tif
open("1_tif_all/AIR00642_1.tif", O_RDONLY) = 6
read(6, "II*\0\220\253\4\0", 8)         = 8
mmap(NULL, 306294, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fb5000
munmap(0x7f9387fb5000, 306294)          = 0
brk(0x1af309000)                        = 0x1af309000
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 53) = 53
brk(0x1b03da000)                        = 0x1b03da000

# Another .tif
open("1_tif_all/AIR00642_2.tif", O_RDONLY) = 6
mmap(NULL, 345726, PROT_READ, MAP_SHARED, 6, 0) = 0x7f9387fab000
munmap(0x7f9387fab000, 345726)          = 0
brk(0x1b0c42000)                        = 0x1b0c42000
write(2, "[INFO/MapProcess-1] Shot 642: Pr"..., 51) = 51

# I'm done reading my images
write(2, "[INFO/MapProcess-1] Shot 642: Fi"..., 72) = 72

# Allocating some more arrays for additional variables
# Increases by about 8M at a time
brk(0x1b1453000)                        = 0x1b1453000
brk(0x1b1c63000)                        = 0x1b1c63000
brk(0x1b2473000)                        = 0x1b2473000
brk(0x1b2c84000)                        = 0x1b2c84000
brk(0x1b3494000)                        = 0x1b3494000
brk(0x1b3ca5000)                        = 0x1b3ca5000

# What are these mmap calls doing here?
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9377df1000
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9367be2000
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93579d3000
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93477c4000
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93375b5000
munmap(0x7f93579d3000, 270594048)       = 0
munmap(0x7f93477c4000, 270594048)       = 0
mmap(NULL, 270594048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f93579d3000
munmap(0x7f93375b5000, 270594048)       = 0
mmap(NULL, 50737152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9354970000
munmap(0x7f9354970000, 50737152)        = 0
brk(0x1b4cc6000)                        = 0x1b4cc6000
brk(0x1b5ce7000)                        = 0x1b5ce7000

EDIT 3: Is freeing handled differently for small/large numpy arrays? may be relevant. I'm getting increasingly convinced that I am simply allocating too many arrays that do not get released to the system because it is, indeed, standard behavior. Will try to allocate my arrays beforehand and reuse them as needed.

like image 782
F.X. Avatar asked Nov 11 '22 21:11

F.X.


1 Answers

Doh. I really should have checked those C extensions a fifth time.

I had forgotten to decrement a reference count in one of the temporary NumPy arrays I had allocated from C. The array didn't leave the C code, so I didn't see that I needed to deallocate it.

I still have no idea why it didn't appear in objgraph.

like image 189
F.X. Avatar answered Nov 15 '22 01:11

F.X.