Generic question regarding Python-code. How can I most effectively locate the worst parts of my Python-code with respect to memory usage?
See e.g. this small example
def my_func():
a = [1] * (12 ** 4)
return a
def my_func2():
b = [2] * (10 ** 7)
return b
if __name__ == '__main__':
a1 = my_func()
a2 = my_func2()
How can I in an automated way tell that a2 is much larger that a1 in size?
And how can I - still automated - root this back towards my_func1()
and my_func2()
?
For C/C++ code I would use valgrind --tool=massif
, which can directly locate the heavy-weights regarding memory usage - but for Python I need your help.
Meliae appears to give some of the answer, but not nearly as good as massif does for C/C++.
Python uses a portion of the memory for internal use and non-object memory. The other portion is dedicated to object storage (your int, dict, and the like). Note that this was somewhat simplified. If you want the full picture, you can check out the CPython source code, where all this memory management happens.
In Python, everything is an object, from variables to lists and dictionaries everything is treated as objects. In this article, we are going to get the memory address of an object in Python. We can get an address using the id () function. id () function gives the address of the particular object.
We will learn how to clear memory for a variable, list, and array using two different methods. The two different methods are del and gc.collect (). del and gc.collect () are the two different methods to delete the memory in python. The clear memory method is helpful to prevent the overflow of memory.
When the programmer forgets to delete an unused memory, then the memory will get overflow, and it causes memory leaks. Memory link will cause because of lingering large objects which are not released and reference cycles within the code. GC module is helpful to debug the memory leaks in python.
locals() (resp. globals()) returns a dictionary with all the local (resp. global) alive objects. You can use them like this:
import sys
sizes = dict((obj, sys.getsizeof(eval(obj))) for obj in locals().keys())
The drawback is that it would not be aware of objects that don't have fully implemented __getsizeof__
, like Numpy arrays, or references. For example, if you do:
print sys.getsizeof(a2)
sys.getsizeof(a1)
a2.append(a1)
print sys.getsizeof(a2)
The output will be:
40000036
82980
45000064 ---> The list is 60 times bigger!
And, of course, just deleting a1 will not free its 82 k, because there is still a reference in a1. But we can make it even weirder:
a2 = my_func2()
print sys.getsizeof(a2)
a2.append(a2)
print sys.getsizeof(a2)
And the output will look strangely familiar:
40000036
45000064
Other tools may implement workarounds on this, and search the reference tree, but the general problem of a full memory analysis in Python remain unsolved. And this just gets worse when objects store data via the C API, outside of the scope of the reference counter, which e.g. happens with Numpy arrays.
That said, there are tools that are "good enough" for most practical situations. As in the referenced link, Heapy is a very good option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With