Caching vs Indexing

Tags:

What's the real difference between a caching solution and an indexing solution? It seems to me that an indexing solution is in fact caching with the ability to run search queries (like: Elastic Search). Would there ever be any real reason to use both a caching solution and indexing solution within the same project or does the indexing solution basically make any other caching redundant?

Example: Say I use NEST for ElasticSearch, which would store and return POCOs; if I then query ElasticSearch and have the POCO returned to me, isn't that considered to be using a cached object returned from ElasticSearch?

At the moment, I store data in a cache using an ICacheManager interface I have.. something like this:

return CacheManager.Get(cacheKey, () => {     // return something... });

Would this become redundant with ElasticSearch?

EDIT

Thanks to all of you for the answers. I am fully aware of what a cache is and already understood the general idea behind an index for textual searching, so I was only really wondering whether the index doubles as a cache already and would therefore make any other cache redundant. After all, I wouldn't want to keep 2 caches in memory (example: ElasticSearch + Redis) when one would do fine. I think I have a better idea now though; especially when I realized that not all fields are always stored in the index and so therefore we need to get the object from a cache or direct from the db anyway - at least in some cases. Thanks all!

667

asked Dec 20 '15 21:12

Matt

2 Answers

The whole purpose of a cache is to return already requested data as fast as possible. One constraint of caches is that they cannot be too big either as the lookup time would increase and thus defeat the purpose of having a cache in the first place. That being said, it comes as no surprise that if you plan to have a few million/billion records in your DB, it won't be difficult to index them all but it will be difficult to cache them all, though since RAM is getting cheaper and cheaper, you might be able to store all you need in memory. You also need to ask yourself whether your cache needs to be distributed across several hosts or not (whether now or in the future).

Considering that lookups and queries in ES are extremely fast (+ ES brings you many more benefits in addition to that, such as scoring), i.e. usually faster than retrieving the same data from your DB, it would make sense to use ES as a cache. One issue I see is a common one, i.e. as soon as you start duplicating data (DB -> ES), you need to ensure that both stores don't get out of synch.

Now, if in addition you throw a cache into that mix, it's a third data store to maintain and to ensure is consistent with the main data store. If you know your data is pretty stable, i.e. written and then not updated frequently, then that might be ok, but you need to keep this very concern in mind all the time when designing your data access strategy.

As @paweloque said, in the end it all depends on your exact use case(s). Every problem is different and I can attest that after a few dozen projects around ES over the past five years or so, I've never seen two projects configured the same way. A cache might make sense for some specific cases, but not at all for others.

You need to think hard how and where you need to store your data, who is requesting them (and at what rate), who is creating/updating them (and at what rate), but in the end, the best practice is to keep your stack as lean as possible with only as few components as needed, each one being a potential bottleneck that you have to understand, integrate, maintain, tune and monitor.

Finally, I'd add one more thing: adding a cache or an index should be considered a performance optimization of your software stack. As you probably know the common saying "Premature optimization is root of all evil", you should first go with your database only, measure the performance, load test it, then witness that it might not support the load. Then only, you can decide to throw a cache at it and/or an index depending on the needs. Again, load test, measure, then decide. If you only have ten users making a few requests per day, having only a DB might be perfectly fine. You have to understand when and why you need to add another layer on your Tower of Babel, but most importantly you need to add one layer at a time and see how that layer improves/degrades the stability of the stack.

Last but not least, you can find some online articles from people having used ES as caches (mainly key-value stores, and object caches).

answered Sep 27 '22 22:09

Val

Your question:

Q. What's the real difference between a caching solution and an indexing solution?

A. The simple difference is that cache is used to store frequently used data to serve the same requests faster. In essence your cache is faster than your main store but is lower in size, therefore, data it can store (considering the common that it would be more expensive)

Indexing is made on all of the data to make it searchable faster. A simple Hashtable/HashMap have hash's as indexes and in an Array the 0s and 1s are the indexes.

You can index some columns to search them faster. But cache is the place you would want to have your data to fetch them faster. Normally Cache is the RAM and database is from HardDisk

Cache is also usually a key value store so if you know the key then fetch it from the cache, no need to run a query. In NHibernate and EntityFrameworks, Query caches are plugged in with queries as keys and all the data is cached. Now your queries will be fetched from the cache instead of running it through the database.

answered Sep 27 '22 22:09

Basit Anwer

Related questions
                            
                                How to properly handle NSFileHandle exceptions in Swift 2.0?
                            
                                Google Maps - Multiple Markers via URL only
                            
                                Cannot find module 'glob'
                            
                                How to change color of Bootstrap disabled button?
                            
                                Cipher With ECB Mode Should Not Be Used
                            
                                Create and use Babel plugin without making it a npm module
                            
                                es6 Javascript class using this inside a callback [duplicate]
                            
                                React Native - why padding does not work?
                            
                                Getting error in Android Studio 2.1 with java 8
                            
                                React.createClass vs. ES6 arrow function
                            
                                What is the difference between DAL, DTO and DAO in a 3 tier architecture style including with MVC
                            
                                Cancel a saga when an action is dispatched with redux-saga

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With