I have a program that contains a large number of objects, many of them Numpy arrays. My program is swapping miserably, and I'm trying to reduce the memory usage, because it actually can't finis on my system with the current memory requirements.
I am looking for a nice profiler that would allow me to check the amount of memory consumed by various objects (I'm envisioning a memory counterpart to cProfile) so that I know where to optimize.
I've heard decent things about Heapy, but Heapy unfortunately does not support Numpy arrays, and most of my program involves Numpy arrays.
A NumPy array can be specified to be stored in row-major format, using the keyword argument order='C' , and the column-major format, using the keyword argument order='F' , when the array is created or reshaped. The default format is row-major.
You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript. You'll see line-by-line memory usage once your script exits.
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.
NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists.
One way to tackle the problem if you are calling lots of different functions and you are unsure where the swapping comes from would be to use the new plotting functionality from memory_profiler. First you must decorate the different functions you are using with @profile. For simplicity I'll use the example examples/numpy_example.py shipped with memory_profiler that contains two functions: create_data()
and process_data()
To run your script, instead of running it with the Python interpreter, you use the mprof executable, that is
$ mprof run examples/numpy_example.py
This will create a file called mprofile_??????????.dat
, where the ? will hold numbers representing the current date. To plot the result, simply type mprof plot
and it will generate a plot similar to this (if you have several .dat files it will always take the last one):
Here you see the memory consumption, with brackets indicating when you enter/leave the current function. This way it is easy to see that function process_data()
has a peak
of memory consumption. To further dig into your function, you could use the line-by-line profiler to see the memory consumption of each line in your function. This is run with
python -m memory_profiler examples/nump_example.py
This would give you an output similar to this:
Line # Mem usage Increment Line Contents
================================================
13 @profile
14 223.414 MiB 0.000 MiB def process_data(data):
15 414.531 MiB 191.117 MiB data = np.concatenate(data)
16 614.621 MiB 200.090 MiB detrended = scipy.signal.detrend(data, axis=0)
17 614.621 MiB 0.000 MiB return detrended
where it is clear that scipy.signal.detrend is allocating a huge amount of memory.
Have a look at memory profiler. It provides line by line profiling and Ipython
integration, which makes it very easy to use it:
In [1]: import numpy as np
In [2]: %memit np.zeros(1e7)
maximum of 3: 70.847656 MB per loop
Update
As mentioned by @WickedGrey there seems to be a bug (see github issue tracker) when calling a function more than one time, which I can reproduce:
In [2]: for i in range(10):
...: %memit np.zeros(1e7)
...:
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.894531 MB per loop
maximum of 1: 70.902344 MB per loop
maximum of 1: 70.902344 MB per loop
maximum of 1: 70.902344 MB per loop
maximum of 1: 70.902344 MB per loop
However I don't know to what extend the results maybe influenced (seems to be not that much in my example, so depending on your use case it maybe still useful) and when this issue maybe fixed. I asked that at github.
Since numpy 1.7 there exists a semi built-in way to track memory allocations:
https://github.com/numpy/numpy/tree/master/tools/allocation_tracking
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With