Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch hot-backup strategies

Tags:

It would be interesting if someone could share his best 'hot-backup' strategies for ElasticSearch.

Also, feel free to share tools and libraries related to this problem and can help.

Updated: Thank you @javanna for your response, it's quite complete and provides good direction for further actions.

I also did a small research and found some articles/discussions which can help if somebody has an interest.

  • Elasticsearch backup strategies
  • Backup/restore Elasticsearch index and related snippet on github:gist
  • Elastic Search Backup and Recovery discussion (check the comment of Paul Smith, also he shared a usefull link to his tool for verifying indexes )

Update: Elasticsearch 1.0 have an "official" backup solution - Snapshot/Restore API and this is the only right way to it now. ElasticSearch will identify master shards and take care about consistency. The backup is going to be done incrementally, so you will be able to do it very fast and as often as you want.

like image 452
gakhov Avatar asked Oct 11 '12 09:10

gakhov


People also ask

What is the data backup method used in Elasticsearch?

A snapshot is a backup of a running Elasticsearch cluster. You can use snapshots to: Regularly back up a cluster with no downtime. Recover data after deletion or a hardware failure.

Is Elasticsearch snapshot incremental?

Elasticsearch snapshots are incremental, meaning that they only store data that has changed since the last successful snapshot. The difference in disk usage between frequent and infrequent snapshots is often minimal.

What is snapshot in Elasticsearch?

Overview. An Elasticsearch snapshot is a backup of an index taken from a running cluster. Snapshots are taken incrementally. This means that when Elasticsearch creates a snapshot of an index, it will not copy any data that was already backed up in an earlier snapshot of the index (unless it was changed).


1 Answers

Replicas are a sort of backup, and elasticsearch never allocates one on the same node where the original primary shard is. But then there is still the risk of losing data depending on how many shards, replicas and nodes you have in your cluster.

I would look at the Gateway module, through which you can save the index and the cluster metadata. There are different type of gateway. I'd look at the Shared FS for example, which allows you to copy the index and the metadata to a file system which is shared between all your nodes. You can also manually start a snapshot through the Gateway Snapshot API.

Also, you can make a copy of the data directory (on every node) once you disabled flush through the index.translog.disable_flush index setting. That way you would make sure that no lucene commit will be issued while you're copying. After you made the copy you need to enable flush again.

UPDATE

All the gateway types except for the local one have been deprecated and will be removed in a future version. Elasticsearch 1.0 will be released with a better backup solution.

like image 116
javanna Avatar answered Oct 12 '22 14:10

javanna