Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining cache methods - memcache/disk based

Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us. What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data.

Here's our two approaches that we have thought of right now:

  • Using memcache on >>all<< major queries and leaving it alone to do what it does best.

  • Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage.

The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance. Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes.

What approach should we use? - Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users?

Thanks a lot!

like image 666
Industrial Avatar asked Mar 24 '10 18:03

Industrial


1 Answers

Compromize and combine this two method is a very clever way, I think.

The most obvious cache management rule is latency v.s. size rule, which is used in CPU cached also. In multi level caches each next level should have more size for compensating higher latency. We have higher latency but higher cache hit ratio. So, I didn't recommend you to place disk based cache in front of memcache. Сonversely it's should be place behind memcache. The only exception is if you cache directory mounted in memory (tmpfs). In this case file based cache could compensate high load on memcache, and also could have latency profits (because of data locality).

This two storages (file based, memcache) are not only storages that are convenient for cache. You also could use almost any KV database as they are very good at concurrency control.

Cache invalidation is separate question which can engage your attention. There are several tricks you could use to provide more subtle cache update on cache misses. One of them is dog pile effect prediction. If several concurrent threads got cache miss simultaneously all of them go to backend (database). Application should allow only one of them to proceed and rest of them should wait on cache. Second is background cache update. It's nice to update cache not in web request thread but in background. In background you can control concurrency level and update timeouts more gracefully.

Actually there is one cool method which allows you to do tag based cache tracking (memcached-tag for example). It's very simple under the hood. With every cache entry you save a vector of tags versions which it is belongs to (for example: {directory#5: 1, user#8: 2}). When you reading cache line you also read all actual vector numbers from memcached (this could be effectively performed with multiget). If at least one actual tag version is greater than tag version saved in cache line then cache is invalidated. And when you change objects (for example directory) appropriate tag version should be incremented. It's very simple and powerful method, but have it's own disadvantages, though. In this scheme you couldn't perform efficient cache invalidation. Memcached could easily drop out live entries and keep old entries.

And of course you should remember: "There are only two hard things in Computer Science: cache invalidation and naming things" - Phil Karlton.

like image 123
Denis Bazhenov Avatar answered Sep 28 '22 14:09

Denis Bazhenov