I'm doing some sanity checks with Pympler to make sure that I understand results when I try to profile an actual script, but I'm a bit puzzled at the results. Here are the sanity checks I've tried:
SANITY CHECK 1: I fire up a Python (3) console and do the following:
from pympler import summary, muppy
sum = summary.summarize(muppy.get_objects())
summary.print_(sum)
This results in the following summary:
types | # objects | total size
==================================== | =========== | ============
<class 'str | 16047 | 1.71 MB
<class 'dict | 2074 | 1.59 MB
<class 'type | 678 | 678.27 KB
<class 'code | 4789 | 673.68 KB
<class 'set | 464 | 211.00 KB
<class 'list | 1319 | 147.16 KB
<class 'tuple | 1810 | 120.03 KB
<class 'weakref | 1269 | 99.14 KB
<class 'wrapper_descriptor | 1124 | 87.81 KB
<class 'builtin_function_or_method | 918 | 64.55 KB
<class 'abc.ABCMeta | 64 | 62.25 KB
<class 'method_descriptor | 877 | 61.66 KB
<class 'int | 1958 | 58.88 KB
<class 'getset_descriptor | 696 | 48.94 KB
function (__init__) | 306 | 40.64 KB
If I've just fired up a new Python session, how are there all these strings, dictionaries, lists etc. in memory already? I don't think that Pympler is summarizing the results across all sessions (that would make no sense, but it's the only possibility I could think of).
SANITY CHECK 2: Since I don't quite understand the summary results of a tabula rasa Python session, let's look at the difference in summary after I've defined a few variables/data structures. I fire up another console and do the following:
from pympler import summary, muppy
sum = summary.summarize(muppy.get_objects())
a = {}
b = {}
c = {}
d = {'a': [0, 0, 1, 2], 't': [3, 3, 3, 1]}
sum1 = summary.summarize(muppy.get_objects())
summary.print_(summary.get_diff(sum, sum1))
This results in the following summary:
types | # objects | total size
============================== | =========== | ============
<class 'list | 3247 | 305.05 KB
<class 'str | 3234 | 226.04 KB
<class 'int | 552 | 15.09 KB
<class 'dict | 1 | 480 B
function (_keys) | 0 | 0 B
function (get_path) | 0 | 0 B
function (http_open) | 0 | 0 B
function (memoize) | 0 | 0 B
function (see) | 0 | 0 B
function (recvfrom) | 0 | 0 B
function (rfind) | 0 | 0 B
function (wm_focusmodel) | 0 | 0 B
function (_parse_makefile) | 0 | 0 B
function (_decode_pax_field) | 0 | 0 B
function (__gt__) | 0 | 0 B
I thought I'd just initialized four new dictionaries (albeit 3 are empty), so why does Muppy show a difference of only 1 new dictionary object? Furthermore, why are there thousands of new strings and lists, not to mention the ints?
SANITY CHECK 3: Yet again, I start a new Python session but this time want to see how Pympler handles more complex data types like a list of dictionaries.
from pympler import muppy, summary
sum = summary.summarize(muppy.get_objects())
a = [{}, {}, {}, {'a': [0, 0, 1, 2], 't': [3, 3, 3, 1]}, {'a': [1, 2, 3, 4]}]
sum1 = summary.summarize(muppy.get_objects())
summary.print_(summary.get_diff(sum, sum1))
Which results in the following summary:
types | # objects | total size
===================================================== | =========== | ============
<class 'list | 3233 | 303.88 KB
<class 'str | 3270 | 230.71 KB
<class 'int | 554 | 15.16 KB
<class 'dict | 10 | 5.53 KB
<class 'code | 16 | 2.25 KB
<class 'type | 2 | 1.98 KB
<class 'tuple | 6 | 512 B
<class 'getset_descriptor | 4 | 288 B
function (__init__) | 2 | 272 B
<class '_frozen_importlib_external.SourceFileLoader | 3 | 168 B
<class '_frozen_importlib.ModuleSpec | 3 | 168 B
<class 'weakref | 2 | 160 B
function (__call__) | 1 | 136 B
function (Find) | 1 | 136 B
function (<lambda>) | 1 | 136 B
Even though the lists and dictionaries are nested a bit convoluted, by my count I added 5 new dictionaries and four new lists.
Can someone explain how Muppy is counting objects?
get_objects
in a new Python sessionsummary.summarize(muppy.get_objects())
returns any objects instantiated during the startup and while from pympler import summary, muppy
ran, which explains the large counts.
get_objects
invocationsRemember that the sum
object generated by summary.summarize()
was created after the first snapshot, which explains "thousands of new strings and lists". You can fix this by rewriting your test as:
from pympler import summary, muppy
o1 = muppy.get_objects()
a = {}
b = {}
c = {}
d = {'a': [0, 0, 1, 2], 't': [3, 3, 3, 1]}
o2 = muppy.get_objects()
summary.print_(summary.get_diff(summary.summarize(o1), summary.summarize(o2)))
This will reduce the extraneous diffs to the large list for o1
, and a couple of other objects:
>>> for o in diff['+']:
... print("%s - %s" % (type(o), o if len(o) < 10 else "long list"))
...
<class 'str'> - o2
<class 'list'> - long list
<class 'dict'> - {'a': [0, 0, 1, 2], 't': [3, 3, 3, 1]}
<class 'list'> - ['o2', 'muppy', 'get_objects']
<class 'list'> - [0, 0, 1, 2]
<class 'list'> - [3, 3, 3, 1]
To understand this, we need to know what exactly pympler is inspecting.
muppy.get_objects
implementation relies on
gc.get_objects()
, which is "a list of all objects tracked by the collector" (gc.is_tracked
), except stack frames.
instances of atomic types aren’t tracked and instances of non-atomic types (containers, user-defined objects…) are. However, some type-specific optimizations can be present in order to suppress the garbage collector footprint of simple instances (e.g. dicts containing only atomic keys and values)
__flags__
. (This seems to be a bug, since excluding all container objects misses the "simple instances" of container types, that are not GC-tracked. update Should be fixed in v0.8 released 2019-11-12)If you store the object list o2
as suggested above and check which objects are accounted for using:
def tracked(obj_list, obj):
import gc
return {"tracked_by_muppy": any(id(item) == id(obj) for item in obj_list),
"gc_tracked": gc.is_tracked(obj)}
You'll see that:
Empty dicts are not GC-tracked and as they are only referred to from local variables, they are not accounted for by muppy:
tracked(o2, a) # => {'tracked_by_muppy': False, 'gc_tracked': False}
The non-trivial dict d
is GC-tracked and thus appears in muppy report:
tracked(o2, d) # => {'tracked_by_muppy': True, 'gc_tracked': True}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With