Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In-process cache vs distributed cache on consistency with mutable/immutable objects

I heard my colleague saying an in-process cache would be a better option when caching immutable objects, where consistency is not a big issue (eventually consistency). Whereas an external distributed cache is more suitable for mutable objects that you always want your reads to be consistent (strong).

Is this always the truth? I don't really see how mutability is related to consistency. Can someone help me understand this?

like image 766
peter Avatar asked Oct 25 '15 16:10

peter


People also ask

What is in process cache?

As the name suggests, an in-process cache is an object cache built within the same address space as your application. The Google Guava Library provides a simple in-process cache API that is a good example.

What types of data do you think are most important to have cache for quick distribution?

General Cache Use Cases In-memory data lookup: If you have a mobile / web app front end you might want to cache some information like user profile, some historical / static data, or some api response according to your use cases. Caching will help in storing such data.


2 Answers

When you use a distributed cache, each object is replicated among multiple independent machines, multiple cache nodes.

If your objects are immutable, replication is not an issue: since the objects never change, any cache instance will deliver exactly the same objects.

As soon as the objects become mutable, the consistency issue arise: when you ask a cache instance for an object, how can you be sure that the object which is delivered to you is up-to-date? What if, while one cache instance was serving you, the object was being modified by another user on another cache instance? In that case, you would not receive the latest version, you would receive a stale version.

To deal with this issue, a choice has to be made. One option is to accept some degree of staleness, which allows better performance. Another option is to use some synchronization protocol, so that you never receive stale data: but there obviously is a performance penalty to be paid for this data synchronization between distant cache nodes.

Conversely, imagine that you upload to a cache node some modifications of an object. What if, at the same time, another user uploads some modifications of the same object to another cache node? Should this be allowed, or should it be forbidden by some locking mechanism?

In addition, should object modifications on your cache node become immediately visible to the users of this cache node? Or should they become visible only after they have been replicated to the other nodes?

At the end of the day, mutable objects do make things more complicated when sharing a distributed cache among multiple users. Still, it doesn't mean that these cache should not be used: it just means that it takes more time and more caution to study all available options and choose the appropriate cache for each application.

like image 163
Daniel Strul Avatar answered Sep 30 '22 19:09

Daniel Strul


Although, Daniel has given a good explanation, but for some reason, it wasn't 100% clear to me. So, I googled out, and this article cleared the mist for me.

Excerpts from the article:

While using an in-process cache, your cache elements are local to a single instance of your application. Many medium-to-large applications, however, will not have a single application instance as they will most likely be load-balanced. In such a setting, you will end up with as many caches as your application instances, each having a different state resulting in inconsistency.

Distributed caches, although deployed on a cluster of multiple nodes, offer a single logical view (and state) of the cache. In most cases, an object stored in a distributed cache cluster will reside on a single node in a distributed cache cluster. By means of a hashing algorithm, the cache engine can always determine on which node a particular key-value resides. Since there is always a single state of the cache cluster, it is never inconsistent.

If you are caching immutable objects, consistency ceases to be an issue. In such a case, an in-process cache is a better choice as many overheads typically associated with external distributed caches are simply not there. If your application is deployed on multiple nodes, you cache mutable objects and you want your reads to always be consistent rather than eventually consistent, a distributed cache is the way to go.

like image 27
Guanxi Avatar answered Sep 30 '22 18:09

Guanxi