Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid database query storms using cache-aside pattern

We are using a PostgreSQL database and AppFabric Server, running a moderately busy ASP.NET MVC e-commerce site.

Following the cache-aside pattern we request data from our cache, and if it is not available, we query the database.

This approach results in 'query storms' where the database recieves multiple queries for the same data in a short space of time, while a given object in the cache is being refreshed. This issue is exacerbated by longer running queries, and obviously multiple requests for the same data can cause the query to run longer, forming an unpleasant feedback loop.

One solution to this problem is to use read-locking on the cache. However this can itself cause performance issues in a web farm situation (or even on a single busy web server) as web servers are blocked on reads for no reason, in case there is a database query taking place.

Another solution is to drop the cache-aside pattern and seed the cache independently. This is the approach we have taken to mitigate the immediate issues we are seeing with this problem, however it is not possible with all data.

Am I missing something here? And what other approaches have people taken to avoid this behaviour?

like image 979
jamiecon Avatar asked Nov 14 '22 01:11

jamiecon


1 Answers

Depending on the number of servers you have and your current cache architecture it may be worthwhile to evaluate adding a server-level (or in-process) cache as well. In effect you use this as a fallback cache, and it's especially helpful where hitting the primary storage (database) is either very resource intensive or slow.

When I've used this I've used the cache-aside pattern for the primary cache and a read-through design for the secondary--in which the secondary is locking and ensures the database isn't over-saturated by the same request. With this architecture a primary cache-miss results in at most one query per entity per server (or process) to the database.

So the basic workflow is:

1) Try to retrieve from primary / shared cache pool

* If successful, return
* If unsuccessul, continue

2) Check in-process cache for value

* If successful, return (optionally seeding primary cache)
* If unsuccessul, continue

3) Get lock by cache key (and double-check in-process cache, in case it's been added by another thread)

4) Retrieve object from primary persistence (db)

5) Seed in-process cache and return

I've done this using injectable wrappers, my cache layers all implement the relevant IRepository interface, and StructureMap injects the correct stack of caches. This keeps the actual cache behaviors flexible, focused, and easy to maintain despite being fairly complex.

like image 93
STW Avatar answered Dec 09 '22 17:12

STW