Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memcached consistent hashing not working with 3 of 4 servers down

Story

I have 3 memcached servers running where I shutdown the one or the other to investigate how PHP-memcached behaves upon a server not beeing reachable.

I have defined 4 servers in PHP, 1 to simulate a server that is mostly offline (spare server). When I shutdown 1 server (=> 2 are still online), the third ->get() gives me a result.

When I shutdown one more server (=> 1 is still online), it won't find objects pushed to that last server.

Sample output

First run, 3 of 4 servers up:

Entity not found in cache on 1st try: NOT FOUND
Entity not found in cache on 2nd try: NOT FOUND
Entity not found in cache on 3rd try: NOT FOUND
Entity not found in cache on 4th try: NOT FOUND

Second run, 3 of 4 servers up:

Entity found in Cache: SUCCESS

Third run, 2 of 4 servers up:

Entity not found in cache on 1st try: CONNECTION FAILURE
Entity not found in cache on 2nd try: SERVER IS MARKED DEAD
Entity not found in cache on 3rd try: NOT FOUND
Entity not found in cache on 4th try: NOT FOUND

Fourth run, 1 of 4 servers up:

Entity not found in cache on 1st try: CONNECTION FAILURE
Entity not found in cache on 2nd try: SERVER IS MARKED DEAD
Entity not found in cache on 3rd try: CONNECTION FAILURE
Entity not found in cache on 4th try: SERVER IS MARKED DEAD

Although there is one server left online and I do push my object to memcached everytime it does not find any in cache, it is not able to find the key anymore.

I think it should also work with only a single server left.

Can you explain this behaviour to me?

It looks like it is not possible to implement something that is safe even when I shutdown 19 of 20 servers.

Sidequestion: libketama is not really maintained anymore, is it still good to use it? The logic behind the lib was rather good and is also used in the varnish caching server.

Appendix

My Script:

<?php
require_once 'CachableEntity.php';
require_once 'TestEntity.php';

echo PHP_EOL;

$cache = new Memcached();
$cache->setOption(Memcached::OPT_LIBKETAMA_COMPATIBLE, true);
$cache->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
$cache->setOption(Memcached::OPT_SERVER_FAILURE_LIMIT, 1);
$cache->setOption(Memcached::OPT_REMOVE_FAILED_SERVERS, true);
$cache->setOption(Memcached::OPT_AUTO_EJECT_HOSTS, true);

$cache->setOption(Memcached::OPT_TCP_NODELAY, true);
//$cache->setOption(Memcached::OPT_RETRY_TIMEOUT, 10);

$cache->addServers([
    ['localhost', '11212'],
    ['localhost', '11213'],
    ['localhost', '11214'],
    ['localhost', '11215'], // always offline
]);


$entityId = '/test/test/article_123456789.test';

$entity = new TestEntity($entityId);

$found = false;

$cacheKey = $entity->getCacheKey();

$cacheResult = $cache->get($cacheKey);
if (empty($cacheResult)) {
    echo 'Entity not found in cache on 1st try: ' . $cache->getResultMessage(), PHP_EOL;
    
    $cacheResult = $cache->get($cacheKey);
    if (empty($cacheResult)) {
        echo 'Entity not found in cache on 2nd try: ' . $cache->getResultMessage(), PHP_EOL;
        
        $cacheResult = $cache->get($cacheKey);
        if (empty($cacheResult)) {
            echo 'Entity not found in cache on 3rd try: ' . $cache->getResultMessage(), PHP_EOL;
            
            $cacheResult = $cache->get($cacheKey);
            if (empty($cacheResult)) {
                echo 'Entity not found in cache on 4th try: ' . $cache->getResultMessage(), PHP_EOL;
                
                $entity
                    ->setTitle('TEST')
                    ->setText('Hellow w0rld. Lorem Orem Rem Em M IpsuM')
                    ->setUrl('http://www.google.com/content-123456789.html');
                
                $cache->set($cacheKey, $entity->serialize(), 120);
            }
        }
        else { $found = true; }
    }
    else { $found = true; }
}
else { $found = true; }


if ($found === true) {
    echo 'Entity found in Cache: ' . $cache->getResultMessage(), PHP_EOL;
    $entity->unserialize($cacheResult);
    echo 'Title: ' . $entity->getTitle(), PHP_EOL;
}

echo PHP_EOL;
like image 703
Daniel W. Avatar asked Dec 20 '17 16:12

Daniel W.


1 Answers

  • The behaviour you are experiencing is consistent. When a server is not available it's first marked with a failure and then marked as dead.

The problem is that apparently it would only be coherent if you had set Memcached::OPT_SERVER_FAILURE_LIMIT value to 2 while you have set this to 1. This would have explained why you have two error lines per unreachable server (CONNECTION FAILURE, SERVER IS MARKED AS DEAD)

This seems to be related with timeout. Adding a usleep() after a failure with a matching OPT_RETRY_TIMEOUT value will enable the server to be dropped from the list (see the following bug comment)

  • The value does not replicate to the next server because only keys are distributed.

  • Note that OPT_LIBKETAMA_COMPATIBLE does not use libketama, but only reproduces the same algorithm, which means that it does not matter if libketama is no longer active while this is the recommended configuration in PHP documentation:

It is highly recommended to enable this option if you want to use consistent hashing, and it may be enabled by default in future releases.

EDIT: In my understanding of your post, the message "Entity found in Cache: SUCCESS" only appears on the second run (1 server offline) because there's no change from the previous command and the server hosting this key is still available (so memcached consider from the key that the value is stored on either the 1st, 2nd or 3rd server). Let's call those servers John, George, Ringo and Paul.

In the third run, at start, memcached deduces from the key which one of the four servers owns the value (e.g. John). It asks John twice before giving up because it's now off. Its algorithm then only considers 3 servers (not knowing that Paul is already dead) and deduces that George should contain the value.

George answers twice that it does not contain the value and then store it.

But on the fourth run, John, George and Paul are off. Memcached tries John twice, and then tries George twice. It then stores in Ringo.

The problem here is that the unavailable servers are not memorized between different runs, and that within the same run you have to ask twice a server before it's removed.

like image 54
Adam Avatar answered Oct 27 '22 08:10

Adam