Profile Memory Allocation in Python (with support for Numpy arrays)

Tags:

I have a program that contains a large number of objects, many of them Numpy arrays. My program is swapping miserably, and I'm trying to reduce the memory usage, because it actually can't finis on my system with the current memory requirements.

I am looking for a nice profiler that would allow me to check the amount of memory consumed by various objects (I'm envisioning a memory counterpart to cProfile) so that I know where to optimize.

I've heard decent things about Heapy, but Heapy unfortunately does not support Numpy arrays, and most of my program involves Numpy arrays.

313

asked Jul 30 '10 14:07

chimeracoder

3 Answers

One way to tackle the problem if you are calling lots of different functions and you are unsure where the swapping comes from would be to use the new plotting functionality from memory_profiler. First you must decorate the different functions you are using with @profile. For simplicity I'll use the example examples/numpy_example.py shipped with memory_profiler that contains two functions: create_data() and process_data()

To run your script, instead of running it with the Python interpreter, you use the mprof executable, that is

$ mprof run examples/numpy_example.py

This will create a file called mprofile_??????????.dat, where the ? will hold numbers representing the current date. To plot the result, simply type mprof plot and it will generate a plot similar to this (if you have several .dat files it will always take the last one):

output of memory_profiler's mprof

Here you see the memory consumption, with brackets indicating when you enter/leave the current function. This way it is easy to see that function process_data() has a peak of memory consumption. To further dig into your function, you could use the line-by-line profiler to see the memory consumption of each line in your function. This is run with

python -m memory_profiler examples/nump_example.py

This would give you an output similar to this:

Line #    Mem usage    Increment   Line Contents
================================================
    13                             @profile
    14  223.414 MiB    0.000 MiB   def process_data(data):
    15  414.531 MiB  191.117 MiB       data = np.concatenate(data)
    16  614.621 MiB  200.090 MiB       detrended = scipy.signal.detrend(data, axis=0)
    17  614.621 MiB    0.000 MiB       return detrended

where it is clear that scipy.signal.detrend is allocating a huge amount of memory.

170

answered Oct 05 '22 04:10

Fabian Pedregosa

Have a look at memory profiler. It provides line by line profiling and Ipython integration, which makes it very easy to use it:

In [1]: import numpy as np

In [2]: %memit np.zeros(1e7)
maximum of 3: 70.847656 MB per loop

Update

As mentioned by @WickedGrey there seems to be a bug (see github issue tracker) when calling a function more than one time, which I can reproduce:

In [2]: for i in range(10):
   ...:     %memit np.zeros(1e7)
   ...:     
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.902344 MB per loop
maximum of 1: 70.902344 MB per loop
maximum of 1: 70.902344 MB per loop
maximum of 1: 70.902344 MB per loop

However I don't know to what extend the results maybe influenced (seems to be not that much in my example, so depending on your use case it maybe still useful) and when this issue maybe fixed. I asked that at github.

answered Oct 05 '22 02:10

bmu

Since numpy 1.7 there exists a semi built-in way to track memory allocations:

https://github.com/numpy/numpy/tree/master/tools/allocation_tracking

answered Oct 05 '22 04:10

letmaik

Related questions
                            
                                How to construct a dictionary from two dictionaries in python? [duplicate]
                            
                                Python string.format() percentage without rounding
                            
                                Reading Excel file is magnitudes slower using openpyxl compared to xlrd
                            
                                How to embed a Python interpreter in a PyQT widget
                            
                                How do I run unittest on a Tkinter app?
                            
                                Passing and returning numpy arrays to C++ methods via Cython
                            
                                How to list all exceptions a function could raise in Python 3?
                            
                                Type hint that a function never returns
                            
                                Why is python decode replacing more than the invalid bytes from an encoded string?
                            
                                In Python, how do I know when a process is finished?
                            
                                How can a piece of python code tell if it's running under unittest
                            
                                Sort a numpy array by another array, along a particular axis
                            
                                redis-py with gevent
                            
                                pip fails to install packages from requirements.txt
                            
                                should pytest et al. go in tests_require[] or extras_require{testing[]}?
                            
                                Importing an svg file into a matplotlib figure
                            
                                How to determine appropriate strftime format from a date string?
                            
                                How can I get stub files for `matplotlib`, `numpy`, `scipy`, `pandas`, etc.?
                            
                                How do I make a menu that does not require the user to press [enter] to make a selection?
                            
                                Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Profile Memory Allocation in Python (with support for Numpy arrays)

Tags:

python

memory-management

profile

numpy

chimeracoder

People also ask

3 Answers

Fabian Pedregosa

bmu

letmaik

Recent Activity

Donate For Us