How does django handle multiple memcached servers?

Tags:

In the django documentation it says this:

...

One excellent feature of Memcached is its ability to share cache over multiple servers. This means you can run Memcached daemons on multiple machines, and the program will treat the group of machines as a single cache, without the need to duplicate cache values on each machine. To take advantage of this feature, include all server addresses in LOCATION, either separated by semicolons or as a list.

...

Django's cache framework - Memcached

How exactly does this work? I've read some answers on this site that suggest this is accomplished by sharding across the servers based on hashes of the keys.

Multiple memcached servers question

How does the MemCacheStore really work with multiple servers?

That's fine, but I need a much more specific and detailed answer than that. Using django with pylibmc or python-memcached how is this sharding actually performed? Does the order of IP addresses in the configuration setting matter? What if two different web servers running the same django app have two different settings files with the IP addresses of the memcached servers in a different order? Will that result in each machine using a different sharding strategy that causes duplicate keys and other inefficiencies?

What if a particular machine shows up in the list twice? For example, what if I were to do something like this where 127.0.0.1 is actually the same machine as 172.19.26.240?

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
            '127.0.0.1:11211',
            '172.19.26.240:11211',
            '172.19.26.242:11211',
        ]
    }
}

What if one of the memcached servers has more capacity than the others? If machine one has as 64MB memcached and machine 2 has a 128MB, Will the sharding algorithm take that into account and give machine 2 a greater proportion of the keys?

I've also read that if a memcached server is lost, then those keys are lost. That is obvious when sharding is involved. What's more important is what will happen if a memcached server goes down and I leave its IP address in the settings file? Will django/memcached simply fail to get any keys that would have been sharded to that failed server, or will it realize that server has failed and come up with a new sharding strategy? If there is a new sharding strategy, does it intelligently take the keys that were originally intended for the failed server and divide them among the remaining servers, or does it come up with a brand new strategy as if the first server didn't exist and result in keys being duplicated?

I tried reading the source code of python-memcached, and couldn't figure this out at all. I plan to try reading the code of libmemcached and pylibmc, but I figured asking here would be easier if someone already knew.

456

asked Jul 29 '11 16:07

Apreche

3 Answers

It's the actual memcached client who does the sharding. Django only passes the configuration from settings.CACHES to the client.

The order of the servers doesn't matter*, but (at least for python-memcached) you can specify a 'weight' for each of the servers:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
                ('cache1.example.org:11211', 1),
                ('cache2.example.org:11211', 10),
            ],
}

I think that a quick look at memcache.py (from python-memcached) and especially memcached.Client._get_server should answer the rest of your questions:

def _get_server(self, key):
    if isinstance(key, tuple):
        serverhash, key = key
    else:
        serverhash = serverHashFunction(key)

    for i in range(Client._SERVER_RETRIES):
        server = self.buckets[serverhash % len(self.buckets)]
        if server.connect():
            #print "(using server %s)" % server,
            return server, key
        serverhash = serverHashFunction(str(serverhash) + str(i))
    return None, None

I would expect that the other memcached clients are implemented in a similar way.

Clarification by @Apreche: The order of servers does matter in one case. If you have multiple web servers, and you want them all to put the same keys on the same memcached servers, you need to configure them with the same server list in the same order with the same weights

100

answered Oct 18 '22 08:10

Jakub Roztocil

I tested part of this and found some interesting stuff with django 1.1 and python-memcached 1.44.

On django using 2 memcache servers

cache.set('a', 1, 1000)

cache.get('a') # returned 1

I looked up which memcache server 'a' was sharded to using 2 other django setups each pointing at one of the memcache servers. I simulated a connectivity outage by putting up a firewall between the original django instance and the memcache server that 'a' was stored in.

cache.get('a') # paused for a few seconds and then returned None

cache.set('a', 2, 1000)

cache.get('a') # returned 2 right away

The memcache client library does update its sharding strategy if a server goes down.

Then I removed the firewall.

cache.get('a') # returned 2 for a bit until it detected the server back up then returned 1!

You can read stale data when a memcache server drops and comes back! Memcache doesn't do anything clever to try to prevent this.

This can really mess things up if you're using a caching strategy that puts things in memcache for a long time and depends on cache invalidation to handle updates. An old value can be written to the "normal" cache server for that key and if you loose connectivity and an invalidation is made during that window, when the server becomes accessible again, you'll read stale data that you shouldn't be able to.

One more note: I've been reading about some object/query caching libraries and I think johnny-cache should be immune to this problem. It doesn't explicitly invalidate entries; instead, it changes the key at which a query is cached when a table changes. So it would never accidentally read old values.

Edit: I think my note about johnny-cache working ok is crap. http://jmoiron.net/blog/is-johnny-cache-for-you/ says "there are extra cache reads on every request to load the current generations". If the generations are stored in the cache itself, the above scenario can cause a stale generation to be read.

answered Oct 18 '22 08:10

Dan Benamy

Thought to add this answer two years after the question was asked, since it ranks very highly in search and because we did find a situation where django was talking to only one of the memcached servers.

With a site running on django 1.4.3, python-memcached 1.51 talking to four memcached instances, we found that the database was being queried far more often than expected. Digging futher, we found that cache.get() was returning None for keys that were knew to be present in at least one of the memcached instances. When memcached was started with the -vv option it showed that the question was asked only of one server!

After a lot of hair had been pulled, we switched the backend to django.core.cache.backends.memcached.PyLibMCCache (pylibmc) and the problem went away.

answered Oct 18 '22 08:10

e4c5

Related questions
                            
                                Package Import woes in Python
                            
                                Can a method be used as either a staticmethod or instance method?
                            
                                Python Tkinter Font Chooser
                            
                                Property user is corrupt in the datastore:
                            
                                Python3.2 Str.format value repetition
                            
                                How to pass values to pyparsing parseactions?
                            
                                how to communicate two separate python processes?
                            
                                How can I play more than one song at a time in PyGame?
                            
                                How do I properly install httplib2 on Google App Engine?
                            
                                How to sort a list by checking values in a sublist in python?
                            
                                Fix invalid XML with ampersands in Python
                            
                                Checking A PyObjects C Type
                            
                                lambda i=i: foo(i) in for loop not working
                            
                                Determining tense of a sentence Python
                            
                                Specifying the schema in Pandas to_sql
                            
                                Long-running ssh commands in python paramiko module (and how to end them)
                            
                                Search for string allowing for one mismatch in any location of the string
                            
                                django.db.utils.IntegrityError: duplicate key value violates unique constraint "django_content_type_pkey"
                            
                                Parsing a tweet to extract hashtags into an array
                            
                                Showing and Hiding widgets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does django handle multiple memcached servers?

Tags:

python

memcached

django

sharding

Apreche

People also ask

3 Answers

Jakub Roztocil

Dan Benamy

e4c5

Recent Activity

Donate For Us