I'm pretty new to caching strategies and implementations. I'm working on a project that will be database intensive, but also have information being updated and changed very regularly.
I've found enough info to know generally how to develop the caching function, but what I'm unsure about is the general strategy.
If I cache all query results and group them by logical things that I can clear on triggers that make sense, I'll probably have tens of thousands (at least) tiny files in my cache. Would it make more sense to cache only large query results?
I know that this is a somewhat hardware specific question, but generally speaking at what volume of files does caching become somewhat pointless? Meaning, if you're loading up the file system with all of these tiny files, does access to them eventually become slow enough that you might as well have just not cached the information to start with?
Thanks all, I'm interested in any opinions you have to offer
EDIT: Based on the responses regarding this being absolutely application specific, let me pose the question this way which should be universal:
Assuming that I have an application that depends on one table with 1,000,000 items in it...
Would it be quicker to do a query to retrieve one of those items directly from the database, or to retrieve one of those items from my cache directory with 1,000,000 files, each containing the details of one of those items?
EDIT: Apparently 100,000 wasn't enough to get a valid answer, let's make it 1,000,000. Anyone want to go for 1,000,000,000? Because I can do it...
This is used when there are no frequent writes to the cache(The number of write operations is less). It helps in data recovery (In case of a power outage or system failure). A data write will experience latency (delay) as we have to write to two locations (both Memory and Cache). It Solves the inconsistency problem.
A caching strategy is to determine the relationship between data source and your caching system, and how your data can be accessed. There are various strategies to implement cache but each will have different impacts on your system design and the resulted performance.
Write-through ensures that data is always fresh, but can fail with empty nodes and can populate the cache with superfluous data.
Use MySQL's built in query cache instead of trying to maintain it yourself. It will automatically clear cached queries to tables when they are written to. Plus, it works in memory so it should be very efficient...
Also, don't just cache queries. Try to cache entire segments of the application at different stages in the rendering cycle. So you can let MySQL cache the queries, then you cache each individual view (rendered), each individual block, and each page. Then, you can choose whether or not to pull from cache based upon the request.
For example, a non-logged-in user may get the full page directly from cache. But a logged-in user may not be able to (due to username, etc). So for him, you may be able to render 1/2 your views on the page from cache (since they don't depend on the user object). You still get the benefit of caching, but it'll be tiered based upon need.
If you're really expecting a lot of traffic, it's definitely worth looking into Memcached
. Let MySQL store your queries for you, and then store all user-land cache items in memcache...
Edit: To answer your edit:
Filesystems can become slow if a single directory grows big. As long as you're "namespacing" by directory (so each directory only has a small portion of cache files), you should be fine from that standpoint. As for the exact threshold, it really will depend on your hardware and filesystem more than anything else. I know EXT3 gets quite slow if there are a load of files in a single directory (I have directories with literally hundreds of thousands of files, and it can take up to half a second to simply stat()
one of the files, let alone do any kind of directory listing)...
But realize that if you add another server, you're going to either have duplication of cache (which is not a good thing), or are going to have to rewrite your entire cache layer. Is there a reason not to go with Memcached
right from the start?
EDit 2: To answer your latest edit:
It's still too tough to call. I have an application that has a database with around 1.5 billion rows (growing at around 500k per day). We don't use any caching on it at all because we don't have concurrency issues. And even if we did, we'd be better off throwing more MySQL servers at it rather than adding caching since any form of cache would have such a low hit rate that it wouldn't be worth the development time to add it.
And that's the reason I am so adamant about not caching for speed. There will always be an object that is not in cache. So if you hit a page with one of those objects, it still needs to be fast. As a rule of thumb, I try to cache anything that will be accessed again in the next few minutes (I keep a time to live of about 5 minutes in production on other applications anyway). So if items aren't getting more than a few hits in that time span, or the hit rate is very low (less than 90%), I don't bother caching that item....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With