Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does django handle multiple memcached servers?

In the django documentation it says this:

...

One excellent feature of Memcached is its ability to share cache over multiple servers. This means you can run Memcached daemons on multiple machines, and the program will treat the group of machines as a single cache, without the need to duplicate cache values on each machine. To take advantage of this feature, include all server addresses in LOCATION, either separated by semicolons or as a list.

...

Django's cache framework - Memcached

How exactly does this work? I've read some answers on this site that suggest this is accomplished by sharding across the servers based on hashes of the keys.

Multiple memcached servers question

How does the MemCacheStore really work with multiple servers?

That's fine, but I need a much more specific and detailed answer than that. Using django with pylibmc or python-memcached how is this sharding actually performed? Does the order of IP addresses in the configuration setting matter? What if two different web servers running the same django app have two different settings files with the IP addresses of the memcached servers in a different order? Will that result in each machine using a different sharding strategy that causes duplicate keys and other inefficiencies?

What if a particular machine shows up in the list twice? For example, what if I were to do something like this where 127.0.0.1 is actually the same machine as 172.19.26.240?

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
            '127.0.0.1:11211',
            '172.19.26.240:11211',
            '172.19.26.242:11211',
        ]
    }
}

What if one of the memcached servers has more capacity than the others? If machine one has as 64MB memcached and machine 2 has a 128MB, Will the sharding algorithm take that into account and give machine 2 a greater proportion of the keys?

I've also read that if a memcached server is lost, then those keys are lost. That is obvious when sharding is involved. What's more important is what will happen if a memcached server goes down and I leave its IP address in the settings file? Will django/memcached simply fail to get any keys that would have been sharded to that failed server, or will it realize that server has failed and come up with a new sharding strategy? If there is a new sharding strategy, does it intelligently take the keys that were originally intended for the failed server and divide them among the remaining servers, or does it come up with a brand new strategy as if the first server didn't exist and result in keys being duplicated?

I tried reading the source code of python-memcached, and couldn't figure this out at all. I plan to try reading the code of libmemcached and pylibmc, but I figured asking here would be easier if someone already knew.

like image 456
Apreche Avatar asked Jul 29 '11 16:07

Apreche


People also ask

How do you use memcached in Django?

This answer explains how to install Memcached on Windows 10 and how to integrate it with Django through a specific client. It was validated using Memcached 1.4. 4, Python 2.7 and Django 1.11. Go to the Django project, start the server and you should get much better results in your Time load.

Is Django cache thread-safe?

Django relies on the cache backend to be thread-safe, and a single instance of a memcache. Client is not thread-safe. The issue is with Django only creating a single instance that is shared between all threads (django. core.

What is pre site caching in Django?

Django allows caching on different caching spaces as well as on different parts of a website. There can be certain parts of the website which can demand more CPU time and granularity to implement caching on them individually. This makes caching efficient.


3 Answers

It's the actual memcached client who does the sharding. Django only passes the configuration from settings.CACHES to the client.

The order of the servers doesn't matter*, but (at least for python-memcached) you can specify a 'weight' for each of the servers:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
                ('cache1.example.org:11211', 1),
                ('cache2.example.org:11211', 10),
            ],
}

I think that a quick look at memcache.py (from python-memcached) and especially memcached.Client._get_server should answer the rest of your questions:

def _get_server(self, key):
    if isinstance(key, tuple):
        serverhash, key = key
    else:
        serverhash = serverHashFunction(key)

    for i in range(Client._SERVER_RETRIES):
        server = self.buckets[serverhash % len(self.buckets)]
        if server.connect():
            #print "(using server %s)" % server,
            return server, key
        serverhash = serverHashFunction(str(serverhash) + str(i))
    return None, None

I would expect that the other memcached clients are implemented in a similar way.


Clarification by @Apreche: The order of servers does matter in one case. If you have multiple web servers, and you want them all to put the same keys on the same memcached servers, you need to configure them with the same server list in the same order with the same weights

like image 100
Jakub Roztocil Avatar answered Oct 18 '22 08:10

Jakub Roztocil


I tested part of this and found some interesting stuff with django 1.1 and python-memcached 1.44.

On django using 2 memcache servers

cache.set('a', 1, 1000)

cache.get('a') # returned 1

I looked up which memcache server 'a' was sharded to using 2 other django setups each pointing at one of the memcache servers. I simulated a connectivity outage by putting up a firewall between the original django instance and the memcache server that 'a' was stored in.

cache.get('a') # paused for a few seconds and then returned None

cache.set('a', 2, 1000)

cache.get('a') # returned 2 right away

The memcache client library does update its sharding strategy if a server goes down.

Then I removed the firewall.

cache.get('a') # returned 2 for a bit until it detected the server back up then returned 1!

You can read stale data when a memcache server drops and comes back! Memcache doesn't do anything clever to try to prevent this.

This can really mess things up if you're using a caching strategy that puts things in memcache for a long time and depends on cache invalidation to handle updates. An old value can be written to the "normal" cache server for that key and if you loose connectivity and an invalidation is made during that window, when the server becomes accessible again, you'll read stale data that you shouldn't be able to.

One more note: I've been reading about some object/query caching libraries and I think johnny-cache should be immune to this problem. It doesn't explicitly invalidate entries; instead, it changes the key at which a query is cached when a table changes. So it would never accidentally read old values.

Edit: I think my note about johnny-cache working ok is crap. http://jmoiron.net/blog/is-johnny-cache-for-you/ says "there are extra cache reads on every request to load the current generations". If the generations are stored in the cache itself, the above scenario can cause a stale generation to be read.

like image 40
Dan Benamy Avatar answered Oct 18 '22 08:10

Dan Benamy


Thought to add this answer two years after the question was asked, since it ranks very highly in search and because we did find a situation where django was talking to only one of the memcached servers.

With a site running on django 1.4.3, python-memcached 1.51 talking to four memcached instances, we found that the database was being queried far more often than expected. Digging futher, we found that cache.get() was returning None for keys that were knew to be present in at least one of the memcached instances. When memcached was started with the -vv option it showed that the question was asked only of one server!

After a lot of hair had been pulled, we switched the backend to django.core.cache.backends.memcached.PyLibMCCache (pylibmc) and the problem went away.

like image 36
e4c5 Avatar answered Oct 18 '22 08:10

e4c5