Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching & avoiding Cache Stampedes - multiple simultaneous calculations

Tags:

caching

perl

We have a very expensive calculation that we'd like to cache. So we do something similar to:

my $result = $cache->get( $key );

unless ($result) {
    $result = calculate( $key );
    $cache->set( $key, $result, '10 minutes' );
}

return $result;

Now, during calculate($key), before we store the result in the cache, several other requests come in, that also start running calculate($key), and system performance suffers because many processes are all calculating the same thing.

Idea: Lets put a flag in the cache that a value is being calculated, so the other requests just wait for that one calculation to finish, so they all use it. Something like:

my $result = $cache->get( $key );

if ($result) {
    while ($result =~ /Wait, \d+ is running calculate../) {
        sleep 0.5;
        $result = $cache->get( $key );
    }
} else {
    $cache->set( $key, "Wait, $$ is running calculate()", '10 minutes' );
    $result = calculate( $key );
    $cache->set( $key, $result, '10 minutes' );
}


return $result;

Now that opens up a whole new can of worms. What if $$ dies before it sets the cache. What if, what if... All of them solvable, but since there is nothing in CPAN that does this (there is something in CPAN for everything), I start wondering:

Is there a better approach? Is there a particular reason e.g. Perl's Cache and Cache::Cache classes don't provide some mechanism like this? Is there a tried and true pattern I could use instead?

Ideal would be a CPAN module with a debian package already in squeeze or a eureka moment, where I see the error of my ways... :-)

EDIT: I have since learned that this is called a Cache stampede and have updated the question's title.

like image 428
Peter V. Mørch Avatar asked May 22 '12 21:05

Peter V. Mørch


3 Answers

flock() it.

Since your worker processes are all on the same system, you can probably use good, old-fashioned file locking to serialize the expensive calculate()ions. As a bonus, this technique appears in several of the core docs.

use Fcntl qw(:DEFAULT :flock);    # warning:  this code not tested

use constant LOCKFILE => 'you/customize/this/please';

my $result = $cache->get( $key );

unless ($result) {
    # Get an exclusive lock
    my $lock;
    sysopen($lock, LOCKFILE, O_WRONLY|O_CREAT) or die;
    flock($lock, LOCK_EX) or die;

    # Did someone update the cache while we were waiting?
    $result = $cache->get( $key );

    unless ($result) {
        $result = calculate( $key );
        $cache->set( $key, $result, '10 minutes' );
    }

    # Exclusive lock released here as $lock goes out of scope
}

return $result;

Benefit: worker death will instantly release the $lock.

Risk: LOCK_EX can block forever, and that is a long time. Avoid SIGSTOPs, perhaps get comfortable with alarm().

Extension: if you don't want to serialize all calculate() calls, but merely all calls for the same $key or some set of keys, your workers can flock() /some/lockfile.$key_or_a_hash_of_the_key.

like image 187
pilcrow Avatar answered Nov 15 '22 03:11

pilcrow


Use lock? Or maybe that would be an overkill? Or if it is possible, precalculate the result offline then use it online?

like image 39
holygeek Avatar answered Nov 15 '22 02:11

holygeek


Although it may (or may not) be overkill for your use case, have you considered using a message queue for the processing? RabbitMQ seems to be a popular choice in the Perl community at the moment and it is supported through the AnyEvent::RabbitMQ module.

The basic strategy in this case would be to submit a request to the message queue whenever you need to calculate a new key. The queue could then be set to calculate only a single key at a time (in the order requested) if that's all you can reliably handle. Alternately, if you can safely compute multiple keys concurrently, the queue can also be used to consolidate multiple requests for the same key, computing it once and returning the result to all clients who requested that key.

Of course, this would add a bit of complexity and AnyEvent calls for a somewhat different programming style than you may be used to (I would offer an example, but I've never really gotten the hang of it myself), but it may offer sufficient gains in efficiency and reliability to make those costs worth your while.

like image 30
Dave Sherohman Avatar answered Nov 15 '22 03:11

Dave Sherohman