How to tune Elasticsearch to make it indexing fast?

Q: How can I increase my index Speed?

Go to Control Panel | Indexing Options to monitor the indexing. The DisableBackOff = 1 option makes the indexing go faster than the default value. You can continue to work on the computer but indexing will continue in the background and is less likely to pause when other programs are running.

Q: Why is my Elasticsearch slow?

Slow queries are often caused byPoorly written or expensive search queries. Poorly configured Elasticsearch clusters or indices. Saturated CPU, Memory, Disk and network resources on the cluster.

Q: What is the need for tuning the performance of Elasticsearch?

With Elasticsearch, you generally want the max and min HEAP values to match to prevent HEAP from resizing at runtime. So when you're testing values of HEAP with your cluster, make sure that both values match. Elasticsearch's current guide states that there is an “ideal sweet spot” at around 64 GB of RAM.

Tags:

elasticsearch

My ElasticSearch are not going to do some complicated query. I am using ElasticSearch just for fast searches performance on large datasets.

It is running fine. The search is simple and fast.

But with the documents in index become huge, adding new documents become slow and slower.

When the size of an index is small, adding/indexing 1 million documents would take about 250 seconds.
But when the size of the same index reached about 50 GB, adding 1 million documents would take about 1000 seconds.
When the size of the same index reached 100 GB, adding 1 million documents would take much longer.
And sometimes in the process of indexing 1 million documents, I can see the elastic search connection error and the error is from the codes near the codes line. "//<2.0 "i just blew up" nonstructured exception". And I only see this error when I tried to indexing 1 million documents to a large index (about 100 GB). When the index size is smaller, I did not see this error in the log.

I would like to tune the ElasticSearch clusters to make it still return search results fast, but I also want it to be able indexing/adding documents fast even when index reaches size of 100 GB or bigger.

I would

Use 3 nodes in one cluster (I did not find good answer of the number of nodes in Cluster, so three seems to be a good number as some articles suggested)
I would use 5 shards 1 replica for each index. (I did not find the good number too, this is the default number now)
Right now, I have 5 - 10 indices on one cluster, the cluster size is 1000 GB (300 GB used). Instead of running 10 indices on 1000 GB cluster, if I run one index one cluster (cluster size 200 GB), would that be better in performance in terms of indexing and searching?
The documents I added to the index are summarized projected data. the document have fields number from 6 to 12 etc. I made most of the fields keywords data type, if I make less fields, such as only half of the fields keywords, how much I can improve the indexing documents speed? (in my case, index size reaches to 100 GB, and each day I batch index/add 1 million documents to the index.

So what changes I can make to the above setup to improve the indexing speed and performance, and reduce the error like Elasticsearch connection error in the process?

I am using AWS hosted Elasticsearch.

What else could I do?

Thanks!

840

asked Oct 11 '18 03:10

searain

Video Answer

1 Answers

When you index documents, Your es cluster tries to sync that data to other nodes as well. For Better indexing performance, some improvements can be done.

1 - Set large refresh_interval while indexing. This will delay data sync across nodes and make indexing faster.

2 - Keep optimum batch size, while bulk indexing.

3 - Set Heap size properly, For example for 64Gb node 31 Gb should be the optimum heap. For details - https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html

4 - Increase File Descriptors and MMap - https://www.elastic.co/guide/en/elasticsearch/guide/current/_file_descriptors_and_mmap.html

5 - If you are transforming your data while ingestion then dedicated ingestion node can be used - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

6 - Disable replication (you can enable it after big indexing)

125

answered Nov 15 '22 07:11

xrage

Related questions
                            
                                Haystack and Elasticsearch: Limit number of results
                            
                                Elastic Search : How to get most researched terms
                            
                                Rails - ElasticSearch - Multiple indices in one model
                            
                                How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop?
                            
                                Elastic Search parent with same type
                            
                                Searching against secured AWS ElasticSearch
                            
                                Elasticsearch Histogram of visits
                            
                                Implementing Suggestions 'xxx in Category' using elasticsearch
                            
                                Why my elasticsearch failed to build transportclient in JAVA API?
                            
                                Random disconnects from master node NoNodeAvailableException using Elastic Cloud/Found
                            
                                setting up Elasticsearch server for processing data from microservices
                            
                                Unable to rebuild_index elasticsearch with Django Haystack: 'Connection refused'
                            
                                Root user in Elasticsearch 2.4.0 in Docker container
                            
                                How to perform date arithmetic between nested and unnested dates in Elasticsearch?
                            
                                Connecting to Docker Elasticsearch instance through Java/Spring Boot
                            
                                Connect kibana to elasticsearch in kubernetes cluster
                            
                                End of search results using search_after parameter from Elastic Search API
                            
                                CQRS: project out-of-order notifications in an ElasticSearch read model
                            
                                How to implement ACL on an ElasticSearch-based system?
                            
                                Storing nested objects in elastic search

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With