Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintain a large dictionary in memory for Django-Python?

I have a big key-value pair dump, that I need to lookup for my django-Python webapp.

So, I have following options:

  • Store it as json dump and load it as a python dict.
  • Store it in a dump.py and import the dict from it.
  • Use some targeted systems for this problem: [ Are these really meant for this usecase ? ]
    • Mem-cache
    • Redis
    • Any other option ?

Which from above is the right way to go ?

How will you compare memcache and redis ?

Update:

  • My dictionary is about 5 MB in size and will grow over time.
  • Using Redis/Memcache adds an overhead of hitting a socket every-time, so would dump.py will be better since it would take time to load it to memory but after that it would only do memory lookups.

  • My dictionary needs to be updated every day, considering that dump.py will be problem, since we have to restart the django-server to reload where as I guess it would reflect on the fly in redis and memcache.

  • One uses a system like redis only when you have large amount of data and you have to lookup very frequently, in that case socket gives a overhead so, how do we achieve the advantage ?

Please share your experiences on this !

like image 494
Yugal Jindle Avatar asked May 15 '12 06:05

Yugal Jindle


People also ask

How to use Redis in Django?

Django uses django-redis to execute commands in Redis. Looking at our example app in a text editor, we can see the Redis configuration in the settings.py file. We define a default cache with the CACHES setting, using a built-in django-redis cache as our backend.

What is default caching mechanism in Django?

Unless we explicitly specify another caching method in our settings file, Django defaults to local memory caching. As its name implies, this method stores cached data in RAM on the machine where Django is running. Local memory caching is fast, responsive, and thread-safe.

How cache works in Django?

To use cache in Django, first thing to do is to set up where the cache will stay. The cache framework offers different possibilities - cache can be saved in database, on file system or directly in memory. Setting is done in the settings.py file of your project.


3 Answers

For choosing Memcache or REDIS, they are capable of tens of thousands request per second on low-end hardware (eg. 80,000 req/s for REDIS on C2D Q8300). With latencies of well below 1ms. You're saying that you're be doing something in order of 20 request a second, so performance wise it's really non-issue.

If you choose dump.py option, you don't need to restart Django to reload. You can make your own simple reloader:

dump.py:

[ dict code...]

mtime = 0

djago code:

import dump #this does nothing if it's already loaded
stat = os.stat(dump_filename)
if(stat.mtime > dump.mtime):
    reload(dump)
    dump.mtime = stat.mtime

like image 139
vartec Avatar answered Oct 05 '22 15:10

vartec


Memcached, though a great product, is trumped by Redis in my book. It offers lots of things that memcached doesn't, like persistence.

It also offers more complex data structures like hashses. What is your particular data dump? How big is it, and how large / what type of values?

like image 28
sberry Avatar answered Oct 05 '22 13:10

sberry


In past for a similar problem I have used the idea of a dump.py . I would think that all of the other data structures would require a layer to convert objects of one kind into python objects . However I would still think that this would depend on data size and the amount of data you are handling . Memcache and redis should have better indexing and look up when it comes to really large data sets and things like regex based lookup . So my recommendation would be

json -- if you are serving the data over http to some other service python file - if data structure is not too large and you need not any special kind of look ups

memcache and redis -- if the data becomes really large

like image 34
dusual Avatar answered Oct 05 '22 13:10

dusual