How to prevent Elasticsearch from index throttling?

Tags:

I have a 40 node Elasticsearch cluster which is hammered by a high index request rate. Each of these nodes makes use of an SSD for the best performance. As suggested from several sources, I have tried to prevent index throttling with the following configuration:

indices.store.throttle.type: none

Unfortunately, I'm still seeing performance issues as the cluster still periodically throttles indices. This is confirmed by the following logs:

[2015-03-13 00:03:12,803][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphonaudit_20150313][19] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:03:12,829][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphonaudit_20150313][19] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2015-03-13 00:03:13,804][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphonaudit_20150313][19] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:03:13,818][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphonaudit_20150313][19] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2015-03-13 00:05:00,791][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphon_20150313][6] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:05:00,808][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphon_20150313][6] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2015-03-13 00:06:00,861][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphon_20150313][6] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:06:00,879][INFO ][index.engine.internal    ] [CO3SCH010160941] [siphon_20150313][6] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5

The throttling occurs after one of the 40 nodes dies for various expected reasons. The cluster immediately enters a yellow state, in which a number of shards will begin initializing on the remaining nodes.

Any idea why the cluster continues to throttle after explicitly configuring it not to? Any other suggestions to have the cluster more quickly return to a green state after a node failure?

251

asked Mar 13 '15 19:03

grouma

2 Answers

The setting that actually corresponds to the maxNumMerges in the log file is called index.merge.scheduler.max_merge_count. Increasing this along with index.merge.scheduler.max_thread_count (where max_thread_count <= max_merge_count) will increase the number of simultaneous merges which are allowed for segments within an individual index's shards.

If you have a very high indexing rate that results in many GBs in a single index, you probably want to raise some of the other assumptions that the Elasticsearch default settings make about segment size, too. Try raising the floor_segment - the minimum size before a segment will be considered for merging, the max_merged_segment - the maximum size of a single segment, and the segments_per_tier -- the number of segments of roughly equivalent size before they start getting merged into a new tier. On an application that has a high indexing rate and finished index sizes of roughly 120GB with 10 shards per index, we use the following settings:

curl -XPUT /index_name/_settings -d'
{
  "settings": {
    "index.merge.policy.max_merge_at_once": 10,
    "index.merge.scheduler.max_thread_count": 10,
    "index.merge.scheduler.max_merge_count": 10,
    "index.merge.policy.floor_segment": "100mb",
    "index.merge.policy.segments_per_tier": 25,
    "index.merge.policy.max_merged_segment": "10gb"
  }
}

Also, one important thing you can do to improve loss-of-node/node restarted recovery time on applications with high indexing rates is taking advantage of index recovery prioritization (in ES >= 1.7). Tune this setting so that the indices that receive the most indexing activity are recovered first. As you may know, the "normal" shard initialization process just copies the already-indexed segment files between nodes. However, if indexing activity is occurring against a shard before or during initialization, the translog with the new documents can become very large. In the scenario where merging goes through the roof during recovery, it's the replay of this translog against the shard that is almost always the culprit. Thus, using index recovery prioritization to recover those shards first and delay shards with less indexing activity, you can minimize the eventual size of the translog which will dramatically improve recovery time.

187

answered Sep 20 '22 06:09

Dusty

We are using 1.7 and noticed a similar problem: The indexing getting throttled even when the IO was not saturated (Fusion IO in our case).

After increasing "index.merge.scheduler.max_thread_count" the problem seems to be gone -- we did not see any more throttling being logged so far.

I would try setting "index.merge.scheduler.max_thread_count" to at least the max reported numMergesInFlight (6 in the logs above).

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/index-modules-merge.html#scheduling

Hope this helps!

answered Sep 20 '22 06:09

Sebastian Röbke

Related questions
                            
                                Performance issue when using ORDER BY dbms_random.value for Oracle database
                            
                                How to calculate a good hash code for a huge list of strings?
                            
                                VBOs slower than obsolete method of drawing primitives - why?
                            
                                Long GC pauses in application
                            
                                Entity Framework is executing too many queries
                            
                                JVM SafePointStatistics - Can anyone help interpret it
                            
                                Browser gzip decompression overhead / speed
                            
                                numpy np.array versus np.matrix (performance)
                            
                                Exceptionally slow Javascript loop
                            
                                What is the fastest way to check a type?
                            
                                Is there a performance advantage in using an object literal over a self instantiated constructor?
                            
                                Pattern Matcher Vs String Split, which should I use?
                            
                                Javascript foreach with condition
                            
                                Hibernate out of memory exception while processing large collection of elements
                            
                                Improving OpenCV performance Android - fast object tracking
                            
                                A big loop within a small loop always faster than a small loop within a big one?
                            
                                Why is the short primitive type significantly slower than long or int?
                            
                                Swift: how add offset to memcpy(...)
                            
                                Plotting large number of time series using ggplot. Is it possible to speed up?
                            
                                CLOS make-instance is really slow and causes heap exhaustion in SBCL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to prevent Elasticsearch from index throttling?

Tags:

performance

indexing

elasticsearch

grouma

People also ask

2 Answers

Dusty

Sebastian Röbke

Recent Activity

Donate For Us