I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function: <pre class="prettyprint"><code>def loadPickle(fp): with open(fp, 'rb') as fh: listOfObj = pickle.load(fh) return listOfObj </code></pre> However, when I try to iteratively load the files I get a memory leak. <pre class="prettyprint"><code>l = ['filepath1', 'filepath2', 'filepath3', 'filepath4'] for fp in l: x = loadPickle(fp) print( 'loaded {0}'.format(fp) ) </code></pre> My memory overflows before <code>loaded filepath2</code> is printed. How can I write code that guarantees that only a single pickle is loaded during each iteration? Answers to related questions on SO suggest using objects defined in the <code>weakref</code> module or explicit garbage collection using the <code>gc</code> module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood. Related: Python garbage collection

You can fix that by adding <code>x = None</code> right after <code>for fp in l:</code>. The reason this works is because it will dereferenciate variable <code>x</code>, hance allowing the python garbage collector to free some virtual memory before calling <code>loadPickle()</code> the second time.

How do I prevent memory leak when I load large pickle files in a for loop?

I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function:

def loadPickle(fp):
    with open(fp, 'rb') as fh:
        listOfObj = pickle.load(fh)
    return listOfObj

However, when I try to iteratively load the files I get a memory leak.

l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
    x = loadPickle(fp)
    print( 'loaded {0}'.format(fp) )

My memory overflows before loaded filepath2 is printed. How can I write code that guarantees that only a single pickle is loaded during each iteration?

Answers to related questions on SO suggest using objects defined in the weakref module or explicit garbage collection using the gc module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood.

Related: Python garbage collection

Are pickle files efficient?

The advantage of using pickle is that it can serialize pretty much any Python object, without having to add any extra code. Its also smart in that in will only write out any single object once, making it effective to store recursive structures like graphs.

Does pickle dump overwrite or append?

However, Pickle doesn't support appending, so you'll have to save your data to a new file (come up with a different file name -- ask the user or use a command-line parameter such as -o test. txt ?) each time the program is run. On a related topic, don't use Pickle.

Does pickle load close the file?

You then open the pickle file for reading, load the content into a new variable, and close up the file. Loading is done through the pickle. load() method.

You can fix that by adding x = None right after for fp in l:.

The reason this works is because it will dereferenciate variable x, hance allowing the python garbage collector to free some virtual memory before calling loadPickle() the second time.

How do I prevent memory leak when I load large pickle files in a for loop?

Tags:

python

memory-leaks

python-3.x

garbage-collection

pickle

Lionel Brooks

People also ask

Video Answer

1 Answers

Ionut Hulub

Recent Activity

Donate For Us

How do I prevent memory leak when I load large pickle files in a for loop?

Tags:

python

memory-leaks

python-3.x

garbage-collection

pickle

Lionel Brooks

People also ask

Video Answer

1 Answers

Ionut Hulub

Related questions

Recent Activity

Donate For Us