Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack Overflow, Redis, and Cache invalidation

Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).

Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).

Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?

Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?

Here's the original SO question from 2009:

https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation

A couple of other links:

https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172

https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache

like image 729
sgtz Avatar asked Mar 07 '12 06:03

sgtz


1 Answers

I honestly can't decide if this is a SO question or a MSO question, but:

Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:

  • local memory
  • else check redis, and update local memory
  • else fetch from source, and update redis and local memory

This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.

Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.

So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.

Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.

For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.

Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:

static class Program {     static void Main()     {         const string channelInvalidate = "cache/invalidate";         using(var pub = new RedisConnection("127.0.0.1"))         using(var sub = new RedisSubscriberConnection("127.0.0.1"))         {             pub.Open();             sub.Open();              sub.Subscribe(channelInvalidate, (channel, data) =>             {                 string key = Encoding.UTF8.GetString(data);                 Console.WriteLine("Invalidated {0}", key);             });             Console.WriteLine(                     "Enter a key to invalidate, or an empty line to exit");             string line;             do             {                 line = Console.ReadLine();                 if(!string.IsNullOrEmpty(line))                 {                     pub.Publish(channelInvalidate, line);                 }             } while (!string.IsNullOrEmpty(line));         }     } } 

What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.

like image 139
Marc Gravell Avatar answered Sep 20 '22 15:09

Marc Gravell