Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy memory leak in np.ones?

One of my students showed my the following test case that shows an apparent memory leak in NumPy. I'm wondering if the memory profiler is correct here, or what's going on. Here's the test case:

from memory_profiler import profile
import numpy as np
import gc

@profile
def test():
    arr = np.ones((10000, 6912))
    for i in range(2000):
        arr[0:75,:] = np.ones((75, 6912))
    del arr
    gc.collect()
    pass

test()

This produces the following output:

Filename: test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     5     32.9 MiB     32.9 MiB           1   @profile
     6                                         def test():
     7    560.3 MiB    527.4 MiB           1       arr = np.ones((10000, 6912))
     8    564.2 MiB      0.0 MiB        2001       for i in range(2000):
     9    564.2 MiB      3.9 MiB        2000           arr[0:75,:] = np.ones((75, 6912))
    10     37.0 MiB   -527.3 MiB           1       del arr
    11     37.0 MiB     -0.0 MiB           1       gc.collect()
    12     37.0 MiB      0.0 MiB           1       pass

It looks like the line with np.ones((75, 6912)) is slowly leaking memory (about 4MB here). If we replace this expression with just 1, then the apparent leak disappears.

I've tested this on Python 3.8.10 and 3.9.5 with Numpy versions 1.21.3 (latest at time of writing) and 1.20.3 and memory_profiler version 0.58.0 (latest at time of writing). My operating system is Ubuntu Linux 20.04 LTS; my student demonstrated this on macOS (not sure which version).

What's going on?

like image 273
Alex Reinking Avatar asked Oct 22 '21 18:10

Alex Reinking


1 Answers

The short answer, but not yet one that adds anything new to the conversation is that @hpaulj is right that there is no significant leak of anywhere near 4.1 MiB per call to test() and that what is happening is that not all memory that gets allocated gets returned to the OS. The reason for this is that both the python arena-based allocator and libc malloc request memory from the OS in ranges, then carve this up into smaller regions to satisfy allocation requests. The larger regions typically cannot be freed if at least part of the given region is in use. For example, a python arena cannot be freed if any allocations from that arena have not yet been freed.

You can make some tiny modifications to your program to see that test() is not leaking 4.1 MiB per call. For example suppose you change the last line to 2 lines:

while True:
   test()

If you then run the program and check the virtual address space used by that program (for example, using top or ps) you will see that the virtual address space used by the program stops increasing almost immediately after the first run of test().

Even using the metrics provided by memory_profiler you can see this, by changing your original program so that it just calls test() twice, as in:

test()
test()

If you then run your program, you will see that the reported growth occurs only during the first call:

tim@tim-OptiPlex-3020:~$ python3 so3.py Filename: so3.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     5     32.9 MiB     32.9 MiB           1   @profile
     6                                         def test():
     7    560.0 MiB    527.1 MiB           1       arr = np.ones((10000, 6912))
     8    564.1 MiB      0.0 MiB        2001       for i in range(2000):
     9    564.1 MiB      4.1 MiB        2000           arr[0:75,:] = np.ones((75, 6912))
    10     36.9 MiB   -527.3 MiB           1       del arr
    11     36.8 MiB     -0.0 MiB           1       gc.collect()
    12     36.8 MiB      0.0 MiB           1       pass


Filename: so3.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     5     36.8 MiB     36.8 MiB           1   @profile
     6                                         def test():
     7    564.1 MiB    527.3 MiB           1       arr = np.ones((10000, 6912))
     8    564.1 MiB      0.0 MiB        2001       for i in range(2000):
     9    564.1 MiB      0.0 MiB        2000           arr[0:75,:] = np.ones((75, 6912))
    10     36.8 MiB   -527.3 MiB           1       del arr
    11     36.8 MiB      0.0 MiB           1       gc.collect()
    12     36.8 MiB      0.0 MiB           1       pass

So the next question you might ask why the memory grows during the first call to test() but apparently not during the second. To answer that question we can use https://github.com/vmware/chap which is open source and can be compiled by your students on Linux.

As input, chap generally just requires a core file. In this particular case, we want at least 2 core files because we want to know which allocations made during the first call to test() but never freed.

To do that, we can modify the program to sleep between the calls to test, to give us time to gather the core files. After this slight modification, the revised program looks like this:

from time import sleep
from memory_profiler import profile
import numpy as np
import gc

@profile
def test():
    arr = np.ones((10000, 6912))
    for i in range(2000):
        arr[0:75,:] = np.ones((75, 6912))
    del arr
    gc.collect()
    pass

print('sleep before first test()')
sleep(120)
test()
print('sleep before second test()')
sleep(120)
test()
print('sleep after second test()')
sleep(120)

With those modifications, we can run the program in the background and gather a core before the first call to test(), a core before the second call to test() and a core before the third call to test().

First, as an administrative detail, we set the coredump_filter used by the shell to 0x37, so that when we run python the process will inherit this coredump_filter value and so that when we create cores they will have information about file backed memory.

tim@tim-OptiPlex-3020:~$ cat /proc/self/coredump_filter
00000033
tim@tim-OptiPlex-3020:~$ echo 0x37 >/proc/self/coredump_filter
tim@tim-OptiPlex-3020:~$ cat /proc/self/coredump_filter
00000037

Now we are ready to start the program in the background and gather the first core while the program does the first sleep().

tim@tim-OptiPlex-3020:~$ python3 so4.py &
[2] 125315
tim@tim-OptiPlex-3020:~$ sleep before first test()
sudo gcore -o beforeFirst 125315
[sudo] password for tim: 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile beforeFirst.125315
[Inferior 1 (process 125315) detached]

Then we wait until the first call to test() has finished and gather another core while the program does the second sleep().

sleep before second test()
sudo gcore -o beforeSecond 125315
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile beforeSecond.125315
[Inferior 1 (process 125315) detached]

Then we wait for the second call to test() to complete and gather a third core while the program does the third sleep().

sleep after second test()
sudo gcore -o afterSecond 125315
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile afterSecond.125315
[Inferior 1 (process 125315) detached]

Now we are ready to analyze the cores using chap, which takes a core file as input. The memory that is most interesting with respect to the size of a process is the range that is writable and we can get some details about this by using chap on the core from before the first call to test().

tim@tim-OptiPlex-3020:~$ chap beforeFirst.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x51c000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8a31000 (144,904,192) bytes.

In the above notice the line with "python arena". That one is associated with python's arena-based allocator. Also notice the lines with "libc malloc main arena pages" and with "libc malloc mmapped allocation". Those are, not surprisingly, associated with libc malloc, which is used both by native libraries and in some cases by python, such as when an allocation exceeds a certain size.

As I mentioned earlier, these large ranges get used to allocate small allocations. We can get a count for the used allocations (which are ones that have not been freed) and for the free allocations (which occupy space that has not yet been given back to the OS and could be used for future allocations).

chap> count used
114423 allocations use 0xf4df58 (16,047,960) bytes.
chap> count free
730 allocations use 0x5fb30 (391,984) bytes.

Now we can compare by using the same 3 commands in chap on the second core. What we see is that the growth all appeared in the ranges summarized as used by "libc malloc main arena pages", which grew from 0x51c000 bytes to 0x926000, or slightly more than 4 MiB.

tim@tim-OptiPlex-3020:~$ chap beforeSecond.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x926000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8e3b000 (149,139,456) bytes.

If we drill down further we can see that the used allocations grew by a bit less than 100,000 bytes and the free allocations grew by about 4 MiB.

chap> count used
114686 allocations use 0xf64ac8 (16,141,000) bytes.
chap> count free
1312 allocations use 0x4522e8 (4,530,920) bytes.
chap> 

This basically proves the theory by @hpaulj, with the exception that there was a little bit of growth in used allocations during that first run of test(). It would probably be interesting to understand that, but for now I'll just note that the bulk of the growth was explained by free allocations. This is not bad because those areas of memory are available for reuse.

So now we check what happened during the second run of test() and can see that the process didn't get larger but there is one more used allocation and very slightly less memory used for free allocations.

tim@tim-OptiPlex-3020:~$ chap afterSecond.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x926000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8e3b000 (149,139,456) bytes.
chap> count used
114687 allocations use 0xf64ca8 (16,141,480) bytes.
chap> count free
1249 allocations use 0x452148 (4,530,504) bytes.
chap> 

So the second run of test() used allocations that were left free after the first run, then freed most of those allocations again when they were not needed. This is working as expected.

One might still ask for an explanation of the extra used allocations after the first call to test() and of the one extra used allocation after the second call to test(). It is possible to do that using the existing core files but I will stop here because that takes more time and I have shown the following:

  1. The memory profiler is basically correct that the program grew by roughly 4 MiB during the first run.
  2. This is at least mostly not a problem because most of the additional memory is free within the process and available for future allocations.
  3. There may or may not be some very small growth after the first call to test() but if so it must be extremely slow because one can run test() in a loop without any obvious signs of growth. If someone decides to run test() in a loop for a really long time and observes that the process does in fact eventually get larger, feel free to ask another question about why and I will continue the analysis as an answer to that question.
like image 54
Tim Boddy Avatar answered Oct 21 '22 14:10

Tim Boddy