Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way of indexing to ElasticSearch

We've been working with ElasticSearch 2.x for a quite while. Everything meets our requirements perfectly except for one weak point: The performance of writing/indexing to ElasticSearch cluster is not very good.

In our case, we have 8 nodes ES cluster, it's 100~ fields wide indices we are putting in ES. The indexing rate is around 50,000 per minute which is way too slow for our scenario. We've tried all tuning methods recommended by www.elastic.co. The fastest way we've found is that construct the json payload as files, they dump them into ES using bulk API. But still, the indexing pace is just too slow.

I've seen some ES-Hadoop connector, also elasticsearch has spark support where you can use saveToES() saves the RDD to ES. I suspect they all use ES bulk API underneath. Can anyone share some experience on them? What is the fastest way of writing indices in ElasticSearch?

like image 380
Shengjie Avatar asked Apr 04 '17 14:04

Shengjie


People also ask

How does Elasticsearch search so fast?

Elasticsearch is fast.Because Elasticsearch is built on top of Lucene, it excels at full-text search. Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short — typically one second.


1 Answers

No matter what third party tool you use outside ES, everything needs to use the ES ways of putting data in. Either Spark, Logstash, your own app all need to use bulk or index api in one way or another. There's no backdoor magic here.

like image 125
Andrei Stefan Avatar answered Oct 28 '22 03:10

Andrei Stefan