I have a dictionary object with about 60,000 keys that I cache and access in my Django view. The view provides basic search functionality where I look for a search term in the dictionary like so:
projects_map = cache.get('projects_map')
projects_map.get('search term')
However, just grabbing the cached object (in line 1) causes a a giant spike in memory usage on the server - upwards of 100MBs sometimes - and the memory isn't released even after the values are returned and the template rendered.
How can I keep the memory from jacking up like this? Also, I've tried explicitly deleting the object after I grab the value but even that doesn't release the memory spike.
Any help is greatly appreciated.
I decided to implement my own indexing table in which I store the keys and their pickled value. Now, instead of using get()
on a dictionary, I use:
ProjectsIndex.objects.get(index_key=<search term>)
and unpickle the value. This seems to take care of the memory issue as I'm no longer loading a giant object into memory. It adds another small query to the page but that's about it. Seems to be the perfect solution...for now.
..what about using some appropriate service for caching, such as redis or memcached instead of loading the huge object in memory python-side? This way, you'll even have the ability to scale on extra machines, should the dictionary grow more..
Anyways, the 100MB memory contain all the data + hash index + misc. overhead; I noticed myself the other day that many times memory doesn't get deallocated until you quit the Python process (I filled up couple gigs of memory from the Python interpreter, loading a huge json object.. :)); it would be interesting if anybody has a solution for that..
Your options with only 512MB ram are:
and, in the latter two cases, try splitting up your objects, so that you never retrieve megabytes of objects from the cache at once.
You can replace your cached dict with something like this; this way, you can continue treating it as you would with a normal dictionary, but data will be loaded from cache only when you really need it.
from django.core.cache import cache
from UserDict import DictMixin
class LazyCachedDict(DictMixin):
def __init__(self, key_prefix):
self.key_prefix = key_prefix
def __getitem__(self, name):
return cache.get('%s:%s' % (self.key_prefix, name))
def __setitem__(self, name, value):
return cache.set('%s:%s' % (self.key_prefix, name), value)
def __delitem__(self, name):
return cache.delete('%s:%s' % (self.key_prefix, name))
def has_key(self, name):
return cache.has_key(name)
def keys():
## Just fill the gap, as the cache object doesn't provide
## a method to list cache keys..
return []
And then replace this:
projects_map = cache.get('projects_map')
projects_map.get('search term')
with:
projects_map = LazyCachedDict('projects_map')
projects_map.get('search term')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With