Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.ndarray objects not garbage collected

While trying to fine-tune some memory leaks in the Python bindings for some C/C++ functions I cam across some strange behavior pertaining to the garbage collection of Numpy arrays.

I have created a couple of simplified cases in order to better explain the behavior. The code was run using the memory_profiler, the output from which follows immediately after. It appears that Python's garbage collection is not working as expected when it comes to NumPy arrays:

# File deallocate_ndarray.py
@profile
def ndarray_deletion():
    import numpy as np
    from gc import collect
    buf = 'abcdefghijklmnopqrstuvwxyz' * 10000
    arr = np.frombuffer(buf)
    del arr
    del buf
    collect()
    y = [i**2 for i in xrange(10000)]
    del y
    collect()

if __name__=='__main__':
    ndarray_deletion()

With the following command I invoked the memory_profiler:

python -m memory_profiler deallocate_ndarray.py

This is what I got:

Filename: deallocate_ndarray.py
Line #    Mem usage    Increment   Line Contents
================================================
 5   10.379 MiB    0.000 MiB   @profile
 6                             def ndarray_deletion():
 7   17.746 MiB    7.367 MiB       import numpy as np
 8   17.746 MiB    0.000 MiB       from gc import collect
 9   17.996 MiB    0.250 MiB       buf = 'abcdefghijklmnopqrstuvwxyz' * 10000
10   18.004 MiB    0.008 MiB       arr = np.frombuffer(buf)
11   18.004 MiB    0.000 MiB       del arr
12   18.004 MiB    0.000 MiB       del buf
13   18.004 MiB    0.000 MiB       collect()
14   18.359 MiB    0.355 MiB       y = [i**2 for i in xrange(10000)]
15   18.359 MiB    0.000 MiB       del y
16   18.359 MiB    0.000 MiB       collect()

I don't understand why even the forced calls to collect don't reduce the memory usage of the program by freeing up some memory. Moreover, even if Numpy arrays don't behave normally due to the underlying C constructs, why doesn't the list (which is pure Python) get garbage collected?

I know that del does not directly call the underlying __del__ method, but you will note that all del statements in the code actually end up reducing the reference count of the corresponding objects to zero (thereby making them eligible for garbage collection AFAIK). Typically, I would expect to see a negative entry in the increment column when an object undergoes garbage collection. Can anyone shed some light on what is going on here?

NOTE: This test was run on OS X 10.10.4, Python 2.7.10 (conda), Numpy 1.9.2 (conda), Memory Profiler 0.33 (conda-binstar), psutil 2.2.1 (conda).

like image 596
prussian_metal Avatar asked Sep 27 '22 08:09

prussian_metal


1 Answers

In order to see the memory garbage collected, I had to increase the size of buf several orders of magnitude. Maybe the size is too small for memory_profiler to detect the change (it queries the OS, so measurements are not very precise) or maybe its too small for the Python garbage collector to care, I don't know.

For example, replacing 10000 by 100000000 in the factor buf yields

Line #    Mem usage    Increment   Line Contents
================================================
21   10.289 MiB    0.000 MiB   @profile
22                             def ndarray_deletion():
23   17.309 MiB    7.020 MiB       import numpy as np
24   17.309 MiB    0.000 MiB       from gc import collect
25 2496.863 MiB 2479.555 MiB       buf = 'abcdefghijklmnopqrstuvwxyz' * 100000000
26 2496.867 MiB    0.004 MiB       arr = np.frombuffer(buf)
27 2496.867 MiB    0.000 MiB       del arr
28   17.312 MiB -2479.555 MiB       del buf
29   17.312 MiB    0.000 MiB       collect()
30   17.719 MiB    0.406 MiB       y = [i**2 for i in xrange(10000)]
31   17.719 MiB    0.000 MiB       del y
32   17.719 MiB    0.000 MiB       collect()
like image 157
Fabian Pedregosa Avatar answered Oct 06 '22 20:10

Fabian Pedregosa