How reliable is ElasticSearch as a primary datastore against factors like write loss, data availability

Tags:

I am working on a project with a requirement of coming up with a generic dashboard where a users can do different kinds of grouping, filtering and drill down on different fields. For this we are looking for a search store that allows slice and dice of data.

There would be multiple sources of data and would be storing it in the Search Store. There may be some pre-computation required on the source data which can be done by an intermediate components.

I have looked through several blogs to understand whether ES can be used reliably as a primary datastore too. It mostly depends on the use-case we are looking for. Some of the information about the use case that we have :

Around 300 million record each year with 1-2 KB.
Assuming storing 1 year data, we are today with 300 GB but use-case can go up to 400-500 GB given growth of data.
As of now not sure, how we will push data, but roughly, it can go up to ~2-3 million records per 5 minutes.
Search request are low, but requires complex queries which can search data for last 6 weeks to 6 months.
document will be indexed across almost all the fields in document.

Some blogs say that it is reliable enough to use as a primary data store -

http://chrisberkhout.com/blog/elasticsearch-as-a-primary-data-store/
http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-indexes-billions-of-messages-using-el.html
https://karussell.wordpress.com/2011/07/13/jetslide-uses-elasticsearch-as-database/

And some blogs say that ES have few limitations -

https://www.found.no/foundation/elasticsearch-as-nosql/
https://www.found.no/foundation/crash-elasticsearch/
http://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore

Has anyone used Elastic Search as the sole truth of data without having a primary storage like PostgreSQL, DynamoDB or RDS? I have looked up that ES has certain issues like split brains and index corruption where there can be a problem with the data loss. So, I am looking to know if anyone has used ES and have got into any troubles with the data

Thanks.

332

asked Apr 24 '15 07:04

Harshit Agrawal

2 Answers

Short answer: it depends on your use case, but you probably don't want to use it as a primary store.

Longer answer: You should really understand all of the possible issues that can come up around resiliency and data loss. Elastic has some great documentation of these issues which you should really understand before using it as a primary data store. In addition Aphyr's post on the topic is a good resource.

If you understand the risks you are taking and you believe that those risks are acceptable (e.g. because small data loss is not a problem for your application) then you should feel free to go ahead and try it.

158

answered Oct 01 '22 23:10

Cory

It is generally a good idea to design redundant data storage solutions. For example, it could be a fast and reliable approach to first just push everything as flat data to a static storage like s3 then have ES pull and index data from there. If you need more flexibility leveraging some ORM, you could have an RDS or Redshift layer in between. This way the data can always be rebuilt in ES.

It depends on your needs and requirements how you set the balance between redundancy and flexibility/performance. If there's a lot of data involved, you could store the raw data statically and just index some parts of it by ES.

Amazon Lambda offers great features:

Many developers store objects in Amazon S3 while using Amazon DynamoDB to store and index the object metadata and enable high speed search. AWS Lambda makes it easy to keep everything in sync by running a function to automatically update the index in Amazon DynamoDB every time objects are added or updated from Amazon S3.

answered Oct 01 '22 23:10

marekful

Related questions
                            
                                Elasticsearch "no requests added" Bulk API Error
                            
                                Elasticsearch how to use multi_match with wildcard
                            
                                error when trying to update the settings
                            
                                Elasticsearch: Find substring match
                            
                                Return the most recent record from ElasticSearch index
                            
                                How to integrate ElasticSearch with MySQL?
                            
                                How do I find where ElasticSearch is installing my plugins?
                            
                                All shards failed
                            
                                Java ElasticSearch None of the configured nodes are available
                            
                                elasticsearch filtering by the size of a field that is an array
                            
                                Restart elasticsearch node
                            
                                Elasticsearch query string query with not equal to?
                            
                                ElasticSearch : IN equivalent operator in ElasticSearch
                            
                                UTF8 encoding is longer than the max length 32766
                            
                                elasticsearch: how to free store size after deleting documents
                            
                                How to know elastic search installed version from kibana?
                            
                                ElasticSearch -- boosting relevance based on field value
                            
                                Create or update mapping in elasticsearch
                            
                                How to copy some ElasticSearch data to a new index
                            
                                Query with match by multiple fields

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How reliable is ElasticSearch as a primary datastore against factors like write loss, data availability

Tags:

full-text-search

nosql

elasticsearch

search-engine

Harshit Agrawal

People also ask

2 Answers

Cory

marekful

Recent Activity

Donate For Us