Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tune Elasticsearch to make it indexing fast?

My ElasticSearch are not going to do some complicated query. I am using ElasticSearch just for fast searches performance on large datasets.

It is running fine. The search is simple and fast.

But with the documents in index become huge, adding new documents become slow and slower.

  • When the size of an index is small, adding/indexing 1 million documents would take about 250 seconds.
  • But when the size of the same index reached about 50 GB, adding 1 million documents would take about 1000 seconds.
  • When the size of the same index reached 100 GB, adding 1 million documents would take much longer.
  • And sometimes in the process of indexing 1 million documents, I can see the elastic search connection error and the error is from the codes near the codes line. "//<2.0 "i just blew up" nonstructured exception". And I only see this error when I tried to indexing 1 million documents to a large index (about 100 GB). When the index size is smaller, I did not see this error in the log.

I would like to tune the ElasticSearch clusters to make it still return search results fast, but I also want it to be able indexing/adding documents fast even when index reaches size of 100 GB or bigger.

I would

  • Use 3 nodes in one cluster (I did not find good answer of the number of nodes in Cluster, so three seems to be a good number as some articles suggested)
  • I would use 5 shards 1 replica for each index. (I did not find the good number too, this is the default number now)
  • Right now, I have 5 - 10 indices on one cluster, the cluster size is 1000 GB (300 GB used). Instead of running 10 indices on 1000 GB cluster, if I run one index one cluster (cluster size 200 GB), would that be better in performance in terms of indexing and searching?
  • The documents I added to the index are summarized projected data. the document have fields number from 6 to 12 etc. I made most of the fields keywords data type, if I make less fields, such as only half of the fields keywords, how much I can improve the indexing documents speed? (in my case, index size reaches to 100 GB, and each day I batch index/add 1 million documents to the index.

So what changes I can make to the above setup to improve the indexing speed and performance, and reduce the error like Elasticsearch connection error in the process?

I am using AWS hosted Elasticsearch.

What else could I do?

Thanks!

like image 840
searain Avatar asked Oct 11 '18 03:10

searain


People also ask

How can I increase my index Speed?

Go to Control Panel | Indexing Options to monitor the indexing. The DisableBackOff = 1 option makes the indexing go faster than the default value. You can continue to work on the computer but indexing will continue in the background and is less likely to pause when other programs are running.

Why is my Elasticsearch slow?

Slow queries are often caused byPoorly written or expensive search queries. Poorly configured Elasticsearch clusters or indices. Saturated CPU, Memory, Disk and network resources on the cluster.

What is the need for tuning the performance of Elasticsearch?

With Elasticsearch, you generally want the max and min HEAP values to match to prevent HEAP from resizing at runtime. So when you're testing values of HEAP with your cluster, make sure that both values match. Elasticsearch's current guide states that there is an “ideal sweet spot” at around 64 GB of RAM.


Video Answer


1 Answers

When you index documents, Your es cluster tries to sync that data to other nodes as well. For Better indexing performance, some improvements can be done.

1 - Set large refresh_interval while indexing. This will delay data sync across nodes and make indexing faster.

2 - Keep optimum batch size, while bulk indexing.

3 - Set Heap size properly, For example for 64Gb node 31 Gb should be the optimum heap. For details - https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html

4 - Increase File Descriptors and MMap - https://www.elastic.co/guide/en/elasticsearch/guide/current/_file_descriptors_and_mmap.html

5 - If you are transforming your data while ingestion then dedicated ingestion node can be used - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

6 - Disable replication (you can enable it after big indexing)

like image 125
xrage Avatar answered Nov 15 '22 07:11

xrage