We've been working with ElasticSearch 2.x for a quite while. Everything meets our requirements perfectly except for one weak point: The performance of writing/indexing to ElasticSearch cluster is not very good.
In our case, we have 8 nodes ES cluster, it's 100~ fields wide indices we are putting in ES. The indexing rate is around 50,000 per minute which is way too slow for our scenario. We've tried all tuning methods recommended by www.elastic.co. The fastest way we've found is that construct the json payload as files, they dump them into ES using bulk API. But still, the indexing pace is just too slow.
I've seen some ES-Hadoop connector, also elasticsearch has spark support where you can use saveToES() saves the RDD to ES. I suspect they all use ES bulk API underneath. Can anyone share some experience on them? What is the fastest way of writing indices in ElasticSearch?
Elasticsearch is fast.Because Elasticsearch is built on top of Lucene, it excels at full-text search. Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short — typically one second.
No matter what third party tool you use outside ES, everything needs to use the ES ways of putting data in. Either Spark, Logstash, your own app all need to use bulk or index api in one way or another. There's no backdoor magic here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With