I'm using this simple code and observing monotonically increasing memory usage. I'm using this little module to dump stuff to disk. I observed it happens with unicode strings and not with integers, is there something I'm doing wrong?
When I do:
>>> from utils.diskfifo import DiskFifo
>>> df=DiskFifo()
>>> for i in xrange(1000000000):
... df.append(i)
Memory consumption is stable
but when I do:
>>> while True:
... a={'key': u'value', 'key2': u'value2'}
... df.append(a)
It goes to the roof. Any hints? below the module...
import tempfile
import cPickle
class DiskFifo:
def __init__(self):
self.fd = tempfile.TemporaryFile()
self.wpos = 0
self.rpos = 0
self.pickler = cPickle.Pickler(self.fd)
self.unpickler = cPickle.Unpickler(self.fd)
self.size = 0
def __len__(self):
return self.size
def extend(self, sequence):
map(self.append, sequence)
def append(self, x):
self.fd.seek(self.wpos)
self.pickler.dump(x)
self.wpos = self.fd.tell()
self.size = self.size + 1
def next(self):
try:
self.fd.seek(self.rpos)
x = self.unpickler.load()
self.rpos = self.fd.tell()
return x
except EOFError:
raise StopIteration
def __iter__(self):
self.rpos = 0
return self
The Python program, just like other programming languages, experiences memory leaks. Memory leaks in Python happen if the garbage collector doesn't clean and eliminate the unreferenced or unused data from Python.
Those numbers can easily fit in a 64-bit integer, so one would hope Python would store those million integers in no more than ~8MB: a million 8-byte objects. In fact, Python uses more like 35MB of RAM to store these numbers. Why? Because Python integers are objects, and objects have a lot of memory overhead.
Python doesn't limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.
The pickler module is storing all objects it has seen in its memo, so it doesn't have to pickle the same thing twice. You want to skip this (so references to your objects aren't stored in your pickler object) and clear the memo before dumping:
def append(self, x):
self.fd.seek(self.wpos)
self.pickler.clear_memo()
self.pickler.dump(x)
self.wpos = self.fd.tell()
self.size = self.size + 1
Source: http://docs.python.org/library/pickle.html#pickle.Pickler.clear_memo
Edit: You can actually watch the size of the memo go up as you pickle your objects by using the following append function:
def append(self, x):
self.fd.seek(self.wpos)
print len(self.pickler.memo)
self.pickler.dump(x)
self.wpos = self.fd.tell()
self.size = self.size + 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With