Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent data with a failed cache node

I'm encountering a problem where data in my database is getting reverted to an old state. I think I have narrowed the problem down to this situation.

Imagine a sequence of two purchases occurring like this:

  • All cache nodes are working
  • A user logs on (their data is pulled from the DB and stored in memcached)
  • A cache node goes down
  • The user continues to browse (and since their data cannot be found in the cache it is pulled from the DB and stored in memcached)
  • The user performs some action that transforms their record [eg leveling up] (their record is updated in the cache and the database)
  • The cache node comes back up
  • We pull the user's data from the cache again and it comes from the original cache node that was previously down
  • Now we have a problem: the node in the cache is out of date!
  • A user makes another action that transforms their record
  • This is saved in the cache and the database but since it was based on an out of date record it stomps on the previous change and effectively reverts it

We have now lost data because the database record was re-written over with partially out of date information.

How can I prevent this using PHP5 and libmemcached with persistent connections? I think what I want is for a cache node to not failover at all; it should just fail to read and write to that node but not remove it from the pool so that I don't end up with duplicate records.

This will increase load on my database by 1/n (where n is the total number of cache nodes) when a node goes down but it's better than ending up with inconsistent data.

Unfortunately I'm having trouble understanding what settings I should change to get this behavior.

like image 547
Brad Dwyer Avatar asked Jan 08 '16 14:01

Brad Dwyer


People also ask

What are the problems caused by Caches?

Caches are generally small stores of temporary memory. If they get too large, they can cause performance to degrade. They also can consume memory that other applications might need, negatively impacting application performance.

What is the cache What are the problems with using the cache?

Caching as a solution to the performance/latency/throughput problems means there is more complexity, which will lead to more bugs. Bugs with caches can be subtle and difficult to debug, and bugs with caches can also cause live site outages.

What is Checkperiod in node cache?

checkperiod: (default: 600) The period in seconds, as a number, used for the automatic delete check interval. 0 = no periodic check. const NodeCache = require( "node-cache" ); const myCache = new NodeCache({ checkperiod: 120 }); // will check every 120 seconds.


2 Answers

I like the versioning and optimistic lock approach implemented in Doctrine ORM. You can do the same. It won't increase load on your database, but will require some refactoring.

Basically, you add a version number to all tables you are caching, change your update queries to increment version version = version + 1 and add where version=$version condition (please note $version comes from your php/memcache). You will need to check number of affected rows, and throw an exception if it is 0.

It is up to you how to handle such exception. You can just invalidate cache for this record, and ask user to re-submit the form, or you can try to merge the changes. At this point you have stale data from the cache, update from the user input, and fresh data from the DB, so the only unrecoverable case is when you have 3 different values for the same column.

like image 63
Alex Blex Avatar answered Oct 01 '22 23:10

Alex Blex


you are making problem more complex, a simple approach should just mark the cache dirty and rebuild it, not just put it back in service with inconsistent data on it.

like image 26
Allen Avatar answered Oct 02 '22 00:10

Allen