Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross data centre replication strategy in elasticsearch

For on demand backup , We have 2 clusters of the same data. One is the primary production one and other is fail over. What are my best options to achieve a real time replication over one cluster to other? In this scenario , even if one cluster fails , we should be immediately able to fail over to the other one. Can we use replicas for the same ?

like image 536
Binoy Bhanujan Avatar asked May 04 '15 10:05

Binoy Bhanujan


People also ask

What is cross-cluster replication in elastic?

Cross-cluster replication works by replaying the history of individual write operations that were performed on the shards of the leader index. Elasticsearch needs to retain the history of these operations on the leader shards so that they can be pulled by the follower shard tasks.

How does Elasticsearch data replication work?

Each index in Elasticsearch is divided into shards and each shard can have multiple copies. These copies are known as a replication group and must be kept in sync when documents are added or removed. If we fail to do so, reading from one copy will result in very different results than reading from another.

What is Elasticsearch CCR?

CCR is designed around an active-passive index model. An index in one Elasticsearch cluster can be configured to replicate changes from an index in another Elasticsearch cluster. The index that is replicating changes is termed a “follower index” and the index being replicated from is termed the “leader index”.


1 Answers

Elasticsearch has poor support for cross datacentre replication. But then one approach that we have tried out is as follows and it works fine for out kind of volume. From one data center we did a snapshot of ES cluster to S3 and from other data center , we do a restore from this same S3. We do this in regular intervals to make sure , we get consistent data across both data centers. As snapshot/restore is incremental in nature and hence it is a good fit for this problem. This makes sure only new data are moved to the other data center. Though this is not real time in nature , it still sets the bill for us.

like image 65
Vineeth Mohan Avatar answered Sep 22 '22 22:09

Vineeth Mohan