I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function:
def loadPickle(fp):
with open(fp, 'rb') as fh:
listOfObj = pickle.load(fh)
return listOfObj
However, when I try to iteratively load the files I get a memory leak.
l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
x = loadPickle(fp)
print( 'loaded {0}'.format(fp) )
My memory overflows before loaded filepath2
is printed.
How can I write code that guarantees that only a single pickle is loaded during each iteration?
Answers to related questions on SO suggest using objects defined in the weakref
module or explicit garbage collection using the gc
module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood.
Related: Python garbage collection
The advantage of using pickle is that it can serialize pretty much any Python object, without having to add any extra code. Its also smart in that in will only write out any single object once, making it effective to store recursive structures like graphs.
However, Pickle doesn't support appending, so you'll have to save your data to a new file (come up with a different file name -- ask the user or use a command-line parameter such as -o test. txt ?) each time the program is run. On a related topic, don't use Pickle.
You then open the pickle file for reading, load the content into a new variable, and close up the file. Loading is done through the pickle. load() method.
You can fix that by adding x = None
right after for fp in l:
.
The reason this works is because it will dereferenciate variable x
, hance allowing the python garbage collector to free some virtual memory before calling loadPickle()
the second time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With