What are the different ways to cache a web application data, developed using Java and NoSQL database? Databases also provide caching, are they, the only & always the best option to go with, for caching?
How else can I cache my data of users on the application. Application contains very user specific data like in a social network. Are there some simple thumb rules of what type of things should be cached?
Can I also cache my data on the application server using Java ?
So the standard way to implement cache is to have a data structure, using which we can access value by a given key in constant time. Now all good, we can save key value pairs in memory and retrieve it whenever we need it.
In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data's primary storage location.
Caching Data Access Strategies Before choosing any strategy, analyze the access pattern of the data & try to fit your application's suitability with any of the following. Read Through / Lazy Loading: Load data into the cache only when necessary. If application needs data for some key x, search in the cache first.
Web caching is performed by retaining HTTP responses and web resources in the cache for the purpose of fulfilling future requests from cache rather than from the origin servers.
If you want a rule of thumb, here's what Michael Jackson (not that Michael Jackson) said:
The ancient tradition is that you don't optimise until you've profiled - that is, until you have hard evidence as to what actually needs to be optimised. Cacheing is a kind of optimisation; it is very likely to be important for your app, but until you are able to put your app under load and look at what objects are taking a long time to obtain (loading from the database or whatever), you won't know what needs cacheing. It really doesn't matter how smart you are, or what advice you get here - until you do that, you will not know what needs to be cached.
As for things you can cache, it's anything, but i suppose you can classify it into three groups:
Finally, the actual mechanism: start with something simple and in-process, like a map with limited size and a least-recently-used eviction policy. That's simple but effective. Something out-of-process like EHCache is more complicated, but has two advantages: you can share caches between multiple processes (helpful if you have a cluster, which you probably will at some point), and you can store data where the garbage collector won't see it, which might save some CPU time (might - this is too big a subject to get into here).
But i reiterate my first point: don't cache until you know what needs to be cached, and once you do, be mindful of the limitations on the benefits of cacheing, and try to keep your cacheing strategy as simple as possible (but no simpler, of course).
I'll assume you're building a relatively typical web application that:
Now, with that stated to answer so of your questions. Most persistence, database or NoSQL, likely have some sort of caching built in such that if you execute the same simple query repeatedly (e.g. retrieval by primary key) it's able to cache the result. However, the more complex the query, the less likely persistence can perform caching on it. In addition, if there's only one server for persistence (i.e. no sharding, or write master/read slaves) it quickly becomes the bottleneck. So the application level caching you want to do usually should occur on the web servers to reduce load on the database.
As far as what should be cached, the heuristic is items frequently accessed and/or expensive to generate (in terms of database/web server processing/memory). Typical candidates are the home page and any other landing page of a site - often the best approach for these is generating a static file and serving that. The next pieces depend on your application, but typically the most effective strategy is caching as close to the final result as possible - often the HTML being served. For your social network this might be a list of featured updates or some such.
As far as user sessions are concerned, these are definitely a good candidate for caching. In this case you can probably get a lot of mileage out of judicious use of the web server's session scope (assuming a JSP server). This data lives in memory and is a good place to keep of user specific information shown once a user authenticates on every page (e.g. first and last name).
Now the final thing to consider is dealing with cache invalidation and really is the hard part of all this (naming stuff is the other hard thing in computer science). In this case using something like memcached or ehcache as others have mentioned is the right approach. ehcache can easily run in process with your java application and does a good job of expiring things, with policies for least recently used and least frequently used, and allowing you to use both memory and disk for caching. What you'll need to think about is the situations where you need to expire something form the cache ahead of this schedule because data's changed. In this case you need to work through those dependencies in your application's architecture so that it read/writes to the cache as appropriate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With