Measuring increase in heap size after loading large object

Question

I'm interested to find out the increase in the total size of python's heap when a large object is loaded. heapy seems to be what I need, but I don't understand the results.

I have a 350 MB pickle file with a pandas DataFrame in it, which contains about 2.5 million entries. When I load the file and inspect the heap with heapy afterwards, it reports that only roughly 8 MB of objects have been added to the heap.

import guppy
h = guppy.hpy()
h.setrelheap()
df = pickle.load(open('test-df.pickle'))
h.heap()

This gives the following output:

Partition of a set of 95278 objects. Total size = 8694448 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  44700  47  4445944  51   4445944  51 str
     1  25595  27  1056560  12   5502504  63 tuple
     2   6935   7   499320   6   6001824  69 types.CodeType
...

What confuses me is the Total size of 8694448 bytes. That's just 8 MB.

Why doesn't Total size reflect the size of the whole DataFrame df?

(Using python 2.7.3, heapy 0.1.10, Linux 3.2.0-48-generic-pae (Ubuntu), i686 )

amit · Accepted Answer

You could try pympler, which worked for me the last time I checked. If you are just interested in the total memory increase and not for a specific class, you could you an OS specific call to get the total memory used. Eg, on unix based OS, you could do something like the following before and after loading the object to get the diff.

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

usual me · Answer

I had a similar problem when I was trying to find out why my 500 MB CSV files were taking up to 5 GB in memory. Pandas is basically build on top of Numpy, and therefore uses C malloc to allocate space. This is why it doesn't show up in heapy, which only profile pure Python objects. One solution might be to look into valgrind to track down your memory leaks.

Measuring increase in heap size after loading large object

Tags:

python

pandas

heapy

rodion

2 Answers

amit

usual me

Recent Activity

Donate For Us

Measuring increase in heap size after loading large object

Tags:

python

pandas

heapy

rodion

2 Answers

amit

usual me

Related questions

Recent Activity

Donate For Us