ElasticSearch vs. ElasticSearch+Cassandra

Tags:

My main question is what is the benefit of integrating Cassandra and Elasticsearch versus using only Elasticsearch?

In fact, there are answers to similar questions on StackOverflow (e.g., here and here). But there are some points:

A lot of answers are old. Much may have changed in these years.
One point that is mentioned is that "Sometimes ElasticSearch loses writes". However, it can be imagined those alleged loses may had been because of some bugs that have been solved in these years. It is assumable that e.g., Cassandra may also have some bugs that cause data loses. Is there any fundamental differences between Cassandra and Elasticsearch that cause Elasticsearch to lose data but doesn't cause it for Cassandra?
It is mentioned that "Schema changes are difficult to do in ElasticSearch without blowing everything away and reloading." This may not be a major problem for us, assuming that our data model is relatively stable or at-least backward-compatible. Also, because of dynamic mapping in Elasticsearch it may adapt itself with the new requirements (e.g., extra fields).
With respect to the indexing delay in Elasticsearch, Cassandra also does not provide consistency. So, in Cassandra you may also face delays in reading the written data.

Overall, what extra features does Cassandra offer when used in conjunction with Elasticsearch?

P.S. It may be better if the question is answered in general. But, if it is necessary, assume that we only append rows to the database and never delete or update anything. We want to be able to do full-text search in the data.

532

asked Apr 15 '20 08:04

Shayan

1 Answers

So as the author of one of the linked answers (Elasticsearch vs Cassandra vs Elasticsearch with Cassandra), I suppose that I should weigh in here.

those alleged loses may had been because of some bugs that have been solved in these years.

This is an absolutely true statement. The answer I wrote is almost six years old, and ElasticSearch has grown to be a much more reliable product in that time. That being said, there are some things which Cassandra can do that ElasticSearch just wasn't designed to do (and vice-versa).

what extra features does Cassandra offer...

I can think of a few, which I'll summarize here:

Write throughput/performance/latency

ElasticSearch is a search engine based on the Lucene project. Handling large amounts of write throughput at low latencies is just not something that it was designed to do; at least not "out of the box." There are ways to configure ElasticSearch to be better at this, as described here: Techniques to Achieve High Write Throughput With ElasticSearch. But in terms of building a new cluster with minimal config, you'll spend less time engineering Cassandra to accomplish this.

"Sometimes ElasticSearch loses writes"

Yes, I wrote that. Again, ElasticSearch has improved. A lot. But I still see this happen under high write throughput conditions. When a cluster is engineered for a certain level of throughput, and an application exceeds those tolerances causing a node to become overwhelmed from the write back-pressure, writes will be lost.

Cassandra is not immune to this problem, either. It just has a higher tolerance for it. If you were to use them both together, architecting something like Kafka to "throttle" the write throughput to each would be a good approach.

Multi Data center High Availability (MDHA)

With the ability to define logical data centers and availability zones (racks), Cassandra has always been good at replicating a data set over multiple regions. This is problematic for ElasticSearch, as it does not have a concept of a logical data center, and its "master" nodes are not active/active.

Peer nodes vs. role-based nodes

As a follow-up to my MDHA point, ElasticSearch now allows for nodes to be designated with a "role" in the cluster. You can specify multiple nodes to act as the "master" role, in-charge of adding and updating indexes. Any node can direct search traffic to the nodes which work under the "data" role. In fact, one way to improve write throughput (my first talking point), is to designate a node or two with the "ingest" role, which can prevent read and write traffic from interfering with each other.

This deviates from Cassandra's approach where every node is a peer, and can handle reads and writes. Being able to treat all nodes the same, simplifies maintenance and administration. And "no," despite popular misconception, a "seed" node not is not anything special.

Query vs. Search

To me, this is the fundamental difference between the two. Querying is not the same as searching. They may seem similar, but they are quite different.

Retrieving data by matching a pattern on one or multiple columns/properties is searching. Also with searching, the number of results is more of an unknown beforehand. Sure, Cassandra has added some features in the last few years to allow for pattern matching based on LIKE queries (I don't recommend its use). But when the ability to "search" a data set is required, Cassandra can't compete with ElasticSearch.

Retrieving data by providing a specific value on a specific key (column) is querying. With querying, it is also easier to have accurate expectations on the number of results to be returned. If I was building an app and I knew that I'd only ever have to retrieve data based on a static, pre-defined query with a specific key, I'd choose Cassandra every time.

With Cassandra, I can also tune query consistency, requiring operational acknowledgement from more or fewer replicas. Likewise, I can also direct those operations to a specific geographic region, based on the locality of the application.

...when used in conjunction with Elasticsearch?

They compliment each other well. Cassandra is good at some things (detailed above) that ElasicSearch is not (and vice-versa...saying that a lot). Requirements for an application may require both searching and querying. Sometimes you've got an app that needs that high-speed key lookup "oh, and we also want search."

Summary, tl;dr;

So while I've written quite a bit here, the main point that I'll keep coming back to, is picking the right tool for the job. When I need to search I'll pick ElasticSearch. When I need to query in a highly-available, geographically-aware scenario, I'll pick Cassandra. I still see applications use both (in tandem), so both have their merits.

160

answered Oct 21 '22 08:10

Aaron

Related questions
                            
                                Unknown key for a START_OBJECT in [bool] in elastic search
                            
                                Elastic Search/Tire: How do I filter a boolean attribute?
                            
                                Static Query Building with NEST
                            
                                Index a dynamic object using NEST
                            
                                How to check if ElasticSearch client is connected?
                            
                                elasticsearch 2 node cluster: proper setup?
                            
                                Elasticsearch: Order of filters for best performance
                            
                                Elastisearch update by query
                            
                                elasticsearch what is the difference between best_field and most_field
                            
                                Can I customize Elastic Search to use my own Stop Word list?
                            
                                Search a nested field for multiple values on the same field with elasticsearch
                            
                                Elasticsearch filter aggregations on minimal doc count
                            
                                How to combine multiple bool queries in elasticsearch
                            
                                Elasticsearch process memory locking failed
                            
                                Symbols in query-string for elasticsearch
                            
                                Using multiple node clients in elasticsearch
                            
                                type keyword and not analyzed, any difference?
                            
                                How to perform indices query in ElasticSearch?
                            
                                How to store money in elasticsearch
                            
                                Delete documents older than 30 days in elasticsearch [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ElasticSearch vs. ElasticSearch+Cassandra

Tags:

nosql

cassandra

elasticsearch

Shayan

People also ask

1 Answers

Aaron

Recent Activity

Donate For Us