Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up Elasticsearch recovery?

I'm working on ES cluster of 6B of small documents, organized in 6.5K indexes, for a total of 6TB. The indexes are replicated and sharded among 7 servers. The indexes occupancy varies from few KB to hundreds of GB.

Before using ES, I used Lucene with the same documents organization.

The recovery of the Lucene based application was quite immediate. In fact, the indexes were lazy loaded when a query arrived and then the IndexReader were cached, to speed up future replies.

Now, with Elasticsearch, the recovery is very slow (tens of minutes). Note that usually before a crash, all the indexes are opened and that most of them receive documents to index quite often.

Is there any good pattern to reduce the ES recovery time? I'm also interested in anything related the index management and not only about the configuration. For example, I would like to recovery faster the most important indexes and then load all the others; by doing so, I can reduce the perceived downtime for most of the users.

I'm using the following configuration:

#Max number of indices cuncurrently loaded at startup
indices.recovery.concurrent_streams: 80

#Max number of bytes cuncurrently readed at startup for loading the indices
indices.recovery.max_bytes_per_sec: 250mb

#Allow to control specifically the number of initial recoveries of primaries that are allowed per node
cluster.routing.allocation.node_initial_primaries_recoveries: 20

#Max number of indices cuncurrently loaded at startup
cluster.routing.allocation.node_concurrent_recoveries: 80

#the number of streams to open (on a node level) for small files (under 5mb) to recover a shard from a peer shard
indices.recovery.concurrent_small_file_streams: 30

PS: Right now I'm using ES 2.4.1, but I will use ES 5.2 in a few weeks. PPS: A scenario could be a recovery after a blackout.

Thank you!

like image 494
Luca Mastrostefano Avatar asked Mar 21 '17 10:03

Luca Mastrostefano


People also ask

What is Elasticsearch recovery?

In Elasticsearch, recovery refers to the process of recovering an index or shard when something goes wrong. There are many ways to recover an index or shard, such as by re-indexing the data from a backup / failover cluster to the current one, or by restoring from an Elasticsearch snapshot.

What is refresh interval Elasticsearch?

By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. You can change this default interval using the index.

What does recovery Shard do?

Shard recovery is the process of syncing a replica shard from a primary shard.


1 Answers

Edit To prioritize recovery on certain indices, you can use the priority setting on index this way:

PUT some_index
{
  "settings": {
    "index.priority": 10
  }
}

The index with the biggest priority will be recovered first, otherwise the recovery is ordered by creation time of the index, see this

Second Edit To change the number of replicas, you simply need a HTTP request:

PUT  index_name/_settings
{
  "index":{
    "number_of_replicas" : "0"
  }
}

Regarding snapshot recovery, I would suggest the following points (some might not be applicable to your case):

  • put the number of replicas at 0 before the recovery then swap it back to its default value(less writing)
  • if using spinning disk, you can add to the elasticsearch.yml to increase the indexing speed: index.merge.scheduler.max_thread_count: 1 (see here)
  • Update before recovery your index settings with: "refresh_interval" : "-1" and put it back at its default value afterward(see the doc)

If you don't care about searching yet, the following on your ES5 cluster could also help:

PUT /_cluster/settings
{
    "transient" : {
        "indices.store.throttle.type" : "none" 
    }
}

A few articles below that could help:

  • https://www.elastic.co/guide/en/elasticsearch/reference/5.x/tune-for-indexing-speed.html
  • https://www.elastic.co/guide/en/elasticsearch/reference/5.x/tune-for-disk-usage.html

A few general tips: be sure you have swapping disable. How much memory is allocated to your nodes in the ES cluster? (You should use half of the total available memory of a node, with a cap at 32 GB due to some memory addressing limit issue of jvms).

like image 193
Adonis Avatar answered Oct 11 '22 16:10

Adonis