We have a very expensive calculation that we'd like to cache. So we do something similar to:
my $result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now, during calculate($key)
, before we store the result in the cache, several other requests come in, that also start running calculate($key)
, and system performance suffers because many processes are all calculating the same thing.
Idea: Lets put a flag in the cache that a value is being calculated, so the other requests just wait for that one calculation to finish, so they all use it. Something like:
my $result = $cache->get( $key );
if ($result) {
while ($result =~ /Wait, \d+ is running calculate../) {
sleep 0.5;
$result = $cache->get( $key );
}
} else {
$cache->set( $key, "Wait, $$ is running calculate()", '10 minutes' );
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
return $result;
Now that opens up a whole new can of worms. What if $$ dies before it sets the cache. What if, what if... All of them solvable, but since there is nothing in CPAN that does this (there is something in CPAN for everything), I start wondering:
Is there a better approach? Is there a particular reason e.g. Perl's Cache
and Cache::Cache
classes don't provide some mechanism like this? Is there a tried and true pattern I could use instead?
Ideal would be a CPAN module with a debian package already in squeeze or a eureka moment, where I see the error of my ways... :-)
EDIT: I have since learned that this is called a Cache stampede and have updated the question's title.
flock()
it.
Since your worker processes are all on the same system, you can probably use good, old-fashioned file locking to serialize the expensive calculate()
ions. As a bonus, this technique appears in several of the core docs.
use Fcntl qw(:DEFAULT :flock); # warning: this code not tested
use constant LOCKFILE => 'you/customize/this/please';
my $result = $cache->get( $key );
unless ($result) {
# Get an exclusive lock
my $lock;
sysopen($lock, LOCKFILE, O_WRONLY|O_CREAT) or die;
flock($lock, LOCK_EX) or die;
# Did someone update the cache while we were waiting?
$result = $cache->get( $key );
unless ($result) {
$result = calculate( $key );
$cache->set( $key, $result, '10 minutes' );
}
# Exclusive lock released here as $lock goes out of scope
}
return $result;
Benefit: worker death will instantly release the $lock
.
Risk: LOCK_EX can block forever, and that is a long time. Avoid SIGSTOPs, perhaps get comfortable with alarm()
.
Extension: if you don't want to serialize all calculate()
calls, but merely all calls for the same $key
or some set of keys, your workers can flock()
/some/lockfile.$key_or_a_hash_of_the_key
.
Use lock? Or maybe that would be an overkill? Or if it is possible, precalculate the result offline then use it online?
Although it may (or may not) be overkill for your use case, have you considered using a message queue for the processing? RabbitMQ seems to be a popular choice in the Perl community at the moment and it is supported through the AnyEvent::RabbitMQ module.
The basic strategy in this case would be to submit a request to the message queue whenever you need to calculate
a new key. The queue could then be set to calculate
only a single key at a time (in the order requested) if that's all you can reliably handle. Alternately, if you can safely compute multiple keys concurrently, the queue can also be used to consolidate multiple requests for the same key, computing it once and returning the result to all clients who requested that key.
Of course, this would add a bit of complexity and AnyEvent calls for a somewhat different programming style than you may be used to (I would offer an example, but I've never really gotten the hang of it myself), but it may offer sufficient gains in efficiency and reliability to make those costs worth your while.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With