Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

System/OS Caching vs. Application Caching

When developing applications that work with compressed on-disk indexes or on-disk files where parts of the index or the file are accessed repetitively (for arguments sake, let's say with something akin to a Zipfian distribution), I wonder when is it sufficient/better to rely on OS-level caching (e.g., memory mapping on say a Debian system), and when is it better to implement something on the application layer (e.g., something like FileChannel buffering or Memcached or a custom LRU-cache in Java code).

For example, one article (in reference to Solr) argues for leaving memory free for OS-caching:

The OS’s cache is really useful, it decreases significantly the time required to answer a query (even after completely restarting the server!), so always remember to keep some memory free for the OS.

This got me wondering whether or not my application-level cache that fills memory with weak maps to LRU Java objects is doing more harm than good, esp. since Java is so greedy in terms of memory overhead ... instead of using that memory to cache a few final result objects, would that space be better used by the OS to cache lots of raw compressed data? On the other hand, the application layer cache would be better for platform independence, allowing for caching no matter what OS the code was running on.

And so I realised that I had no idea how to go about answering that question in a principled way, other than running a couple of specific benchmarks. Which leads me to ask ...

What general guidelines exist for whether to assign available memory for application-level caching, or to leave that memory available for OS-level caching?

In particular, I'd love to be able to better recognise when coding an application-level cache is a waste of time, or even harmful for performance.

like image 383
badroit Avatar asked Oct 26 '12 17:10

badroit


1 Answers

Ultimately the answer is always to measure first, analyze, and then optimize. Run your application under a profiler with and without caching, and see what the differences are. There is simply no substitute for direct observation.

Having said that, there is a principled way to think about your problem. Think about what a cache can do for you:

  • Trade time for memory. The time involved might I/O time, or could be CPU time.
  • Trade a spike in working set memory for a smaller, longer-term increase of working memory.

So, specific to your situation you need to ask the following questions.

  • Without the cache, is your application I/O bound? If you spend 98% of your time chewing on the data and only 2% of your time looking for it, then a cache won't help you much no matter how efficient it is. (A perfectly efficient cache in this case would only increase your performance by about 2%.)
  • How much work does a cache hit avoid? If a cache hit avoids a single fread() call, then maybe the cache isn't doing very much for you. But if a cache hit avoids randomly traversing a few hundred blocks of several very large files, then maybe it is saving you a lot of time. It could also save you a lot of space in the OS's disk cache, making that memory available for other OS operations.
  • What is the rate of cache hits?
  • How large do you have to make the cache to get a good hit rate (usually above 75%)? If the answer is in the hundreds of megabytes then you might as well just let the OS's disk cache do the work for you.

It is often very helpful to make these aspects of your application configurable (whether or not to use the cache, how much memory to set aside for caching, etc..) and the play with the settings to see what works the best for a given scenario.

One of the most interesting developments these days is the availability of solid-state drives. The throughput on these drives isn't as fast as on the better spindles, but random access is often outstanding. That definitely changes things.

Again, there is no substitute for profiling your code.

like image 193
slashingweapon Avatar answered Sep 20 '22 06:09

slashingweapon