I have a numpy
script that -- according to top
-- is using about 5GB of RAM:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16994 aix 25 0 5813m 5.2g 5.1g S 0.0 22.1 52:19.66 ipython
Is there a memory profiler that would enable me to get some idea about the objects that are taking most of that memory?
I've tried heapy
, but guppy.hpy().heap()
is giving me this:
Partition of a set of 90956 objects. Total size = 12511160 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 42464 47 4853112 39 4853112 39 str
1 22147 24 1928768 15 6781880 54 tuple
2 287 0 1093352 9 7875232 63 dict of module
3 5734 6 733952 6 8609184 69 types.CodeType
4 498 1 713904 6 9323088 75 dict (no owner)
5 5431 6 651720 5 9974808 80 function
6 489 1 512856 4 10487664 84 dict of type
7 489 1 437704 3 10925368 87 type
8 261 0 281208 2 11206576 90 dict of class
9 1629 2 130320 1 11336896 91 __builtin__.wrapper_descriptor
<285 more rows. Type e.g. '_.more' to view.>
For some reason, it's only accounting for 12MB of the 5GB (the bulk of the memory is almost certainly used by numpy
arrays).
Any suggestions as to what I might be doing wrong with heapy
or what other tools I should try (other than those already mentioned in this thread)?
The easiest way to profile a single method or function is the open source memory-profiler package. It's similar to line_profiler , which I've written about before . You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript.
The Memory Profiler is a component in the Android Profiler that helps you identify memory leaks and memory churn that can lead to stutter, freezes, and even app crashes. It shows a realtime graph of your app's memory use and lets you capture a heap dump, force garbage collections, and track memory allocations.
1. NumPy uses much less memory to store data. The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
A NumPy array can be specified to be stored in row-major format, using the keyword argument order='C' , and the column-major format, using the keyword argument order='F' , when the array is created or reshaped. The default format is row-major.
Numpy (and its library bindings, more on that in a minute) use C malloc to allocate space, which is why memory used by big numpy allocations doesn't show up in the profiling of things like heapy and never gets cleaned up by the garbage collector.
The usual suspects for big leaks are actually scipy or numpy library bindings, rather than python code itself. I got burned badly last year by the default scipy.linalg interface to umfpack, which leaked memory at the rate of about 10Mb a call. You might want to try something like valgrind to profile the code. It can often give some hints as to where to look at where there might be leaks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With