Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I prevent memory leak when I load large pickle files in a for loop?

I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function:

def loadPickle(fp):
    with open(fp, 'rb') as fh:
        listOfObj = pickle.load(fh)
    return listOfObj

However, when I try to iteratively load the files I get a memory leak.

l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
    x = loadPickle(fp)
    print( 'loaded {0}'.format(fp) )

My memory overflows before loaded filepath2 is printed. How can I write code that guarantees that only a single pickle is loaded during each iteration?

Answers to related questions on SO suggest using objects defined in the weakref module or explicit garbage collection using the gc module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood.

Related: Python garbage collection

like image 402
Lionel Brooks Avatar asked Apr 29 '13 21:04

Lionel Brooks


People also ask

Are pickle files efficient?

The advantage of using pickle is that it can serialize pretty much any Python object, without having to add any extra code. Its also smart in that in will only write out any single object once, making it effective to store recursive structures like graphs.

Does pickle dump overwrite or append?

However, Pickle doesn't support appending, so you'll have to save your data to a new file (come up with a different file name -- ask the user or use a command-line parameter such as -o test. txt ?) each time the program is run. On a related topic, don't use Pickle.

Does pickle load close the file?

You then open the pickle file for reading, load the content into a new variable, and close up the file. Loading is done through the pickle. load() method.


Video Answer


1 Answers

You can fix that by adding x = None right after for fp in l:.

The reason this works is because it will dereferenciate variable x, hance allowing the python garbage collector to free some virtual memory before calling loadPickle() the second time.

like image 183
Ionut Hulub Avatar answered Sep 21 '22 22:09

Ionut Hulub