While trying to fine-tune some memory leaks in the Python bindings for some C/C++ functions I cam across some strange behavior pertaining to the garbage collection of Numpy arrays.
I have created a couple of simplified cases in order to better explain the behavior. The code was run using the memory_profiler
, the output from which follows immediately after. It appears that Python's garbage collection is not working as expected when it comes to NumPy arrays:
# File deallocate_ndarray.py
@profile
def ndarray_deletion():
import numpy as np
from gc import collect
buf = 'abcdefghijklmnopqrstuvwxyz' * 10000
arr = np.frombuffer(buf)
del arr
del buf
collect()
y = [i**2 for i in xrange(10000)]
del y
collect()
if __name__=='__main__':
ndarray_deletion()
With the following command I invoked the memory_profiler
:
python -m memory_profiler deallocate_ndarray.py
This is what I got:
Filename: deallocate_ndarray.py
Line # Mem usage Increment Line Contents
================================================
5 10.379 MiB 0.000 MiB @profile
6 def ndarray_deletion():
7 17.746 MiB 7.367 MiB import numpy as np
8 17.746 MiB 0.000 MiB from gc import collect
9 17.996 MiB 0.250 MiB buf = 'abcdefghijklmnopqrstuvwxyz' * 10000
10 18.004 MiB 0.008 MiB arr = np.frombuffer(buf)
11 18.004 MiB 0.000 MiB del arr
12 18.004 MiB 0.000 MiB del buf
13 18.004 MiB 0.000 MiB collect()
14 18.359 MiB 0.355 MiB y = [i**2 for i in xrange(10000)]
15 18.359 MiB 0.000 MiB del y
16 18.359 MiB 0.000 MiB collect()
I don't understand why even the forced calls to collect
don't reduce the memory usage of the program by freeing up some memory. Moreover, even if Numpy arrays don't behave normally due to the underlying C constructs, why doesn't the list (which is pure Python) get garbage collected?
I know that del
does not directly call the underlying __del__
method, but you will note that all del
statements in the code actually end up reducing the reference count of the corresponding objects to zero (thereby making them eligible for garbage collection AFAIK). Typically, I would expect to see a negative entry in the increment column when an object undergoes garbage collection. Can anyone shed some light on what is going on here?
NOTE: This test was run on OS X 10.10.4, Python 2.7.10 (conda), Numpy 1.9.2 (conda), Memory Profiler 0.33 (conda-binstar), psutil 2.2.1 (conda).
In order to see the memory garbage collected, I had to increase the size of buf several orders of magnitude. Maybe the size is too small for memory_profiler
to detect the change (it queries the OS, so measurements are not very precise) or maybe its too small for the Python garbage collector to care, I don't know.
For example, replacing 10000 by 100000000 in the factor buf
yields
Line # Mem usage Increment Line Contents
================================================
21 10.289 MiB 0.000 MiB @profile
22 def ndarray_deletion():
23 17.309 MiB 7.020 MiB import numpy as np
24 17.309 MiB 0.000 MiB from gc import collect
25 2496.863 MiB 2479.555 MiB buf = 'abcdefghijklmnopqrstuvwxyz' * 100000000
26 2496.867 MiB 0.004 MiB arr = np.frombuffer(buf)
27 2496.867 MiB 0.000 MiB del arr
28 17.312 MiB -2479.555 MiB del buf
29 17.312 MiB 0.000 MiB collect()
30 17.719 MiB 0.406 MiB y = [i**2 for i in xrange(10000)]
31 17.719 MiB 0.000 MiB del y
32 17.719 MiB 0.000 MiB collect()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With