Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading dictionary object causing memory spike

Tags:

python

django

I have a dictionary object with about 60,000 keys that I cache and access in my Django view. The view provides basic search functionality where I look for a search term in the dictionary like so:

projects_map = cache.get('projects_map')
projects_map.get('search term')

However, just grabbing the cached object (in line 1) causes a a giant spike in memory usage on the server - upwards of 100MBs sometimes - and the memory isn't released even after the values are returned and the template rendered.

How can I keep the memory from jacking up like this? Also, I've tried explicitly deleting the object after I grab the value but even that doesn't release the memory spike.

Any help is greatly appreciated.

Update: Solution I ultimately implemented

I decided to implement my own indexing table in which I store the keys and their pickled value. Now, instead of using get() on a dictionary, I use:

ProjectsIndex.objects.get(index_key=<search term>)

and unpickle the value. This seems to take care of the memory issue as I'm no longer loading a giant object into memory. It adds another small query to the page but that's about it. Seems to be the perfect solution...for now.

like image 318
Abid A Avatar asked Oct 07 '22 03:10

Abid A


1 Answers

..what about using some appropriate service for caching, such as redis or memcached instead of loading the huge object in memory python-side? This way, you'll even have the ability to scale on extra machines, should the dictionary grow more..

Anyways, the 100MB memory contain all the data + hash index + misc. overhead; I noticed myself the other day that many times memory doesn't get deallocated until you quit the Python process (I filled up couple gigs of memory from the Python interpreter, loading a huge json object.. :)); it would be interesting if anybody has a solution for that..

Update: caching with very few memory

Your options with only 512MB ram are:

  • Use redis, and have a look here http://redis.io/topics/memory-optimization (but I suspect 512MB isn't enough, even optimizing)
  • Use a separate machine (or a cluster of, since both memcached and redis support sharding) with way more ram to keep the cache
  • Use the database cache backend, much slower but less memory-consuming, as it saves everything on the disk
  • Use filesystem cache (although I don't see the point of preferring this over database cache)

and, in the latter two cases, try splitting up your objects, so that you never retrieve megabytes of objects from the cache at once.

Update: lazy dict spanning over multiple cache keys

You can replace your cached dict with something like this; this way, you can continue treating it as you would with a normal dictionary, but data will be loaded from cache only when you really need it.

from django.core.cache import cache
from UserDict import DictMixin

class LazyCachedDict(DictMixin):
    def __init__(self, key_prefix):
        self.key_prefix = key_prefix

    def __getitem__(self, name):
        return cache.get('%s:%s' % (self.key_prefix, name))

    def __setitem__(self, name, value):
        return cache.set('%s:%s' % (self.key_prefix, name), value)

    def __delitem__(self, name):
        return cache.delete('%s:%s' % (self.key_prefix, name))

    def has_key(self, name):
        return cache.has_key(name)

    def keys():
        ## Just fill the gap, as the cache object doesn't provide
        ## a method to list cache keys..
        return []

And then replace this:

projects_map = cache.get('projects_map')
projects_map.get('search term')

with:

projects_map = LazyCachedDict('projects_map')
projects_map.get('search term')
like image 156
redShadow Avatar answered Oct 10 '22 03:10

redShadow