Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS MySQL RDS fail over - replication lag handling?

In normal MySQL replication setup that when a primary is having an issue, chances are, the slave are lag behind and didn't have the latest data.

In AWS RDS when a slave is being automatically promoted to master, questions:

  1. Is the lagged data forever lost?
  2. If the primary DB is up again, will there be any conflict?
  3. In my application do I need to do some special handling in querying the DB?
like image 721
Ryan Avatar asked May 29 '19 11:05

Ryan


1 Answers

The first fact to point out is that MySQL RDS Read Replicas use asynchronous replication, so your master instance will be replicating these after the transacted SQL has been executed. If this RDS instance fails, then yes there might be a chance that you lose a small amount of data.

enter image description here

It then uses the engines' native asynchronous replication to update the read replica whenever there is a change to the source DB instance

The steps that should be followed during a promotion of a read replica should go as follows:

  • Stop any transactions from being written to the read replica source DB instance, and then wait for all updates to be made to the read replica. Database updates occur on the read replica after they have occurred on the source DB instance, and this replication lag can vary significantly. Use the Replica Lag metric to determine when all updates have been made to the read replica.
  • For MySQL and MariaDB only: If you need to make changes to the MySQL or MariaDB read replica, you must set the read_only parameter to 0 in the DB parameter group for the read replica. You can then perform all needed DDL operations, such as creating indexes, on the read replica. Actions taken on the read replica don't affect the performance of the source DB instance.
  • Promote the read replica by using the Promote option on the Amazon RDS console, the AWS CLI command promote-read-replica, or the PromoteReadReplica Amazon RDS API operation.
  • (Optional) Modify the new DB instance to be a Multi-AZ deployment. For more information, see Modifying an Amazon RDS DB Instance and High Availability (Multi-AZ) for Amazon RDS.

If you promote the RDS instance it becomes a standalone RDS instance at that point, it will only contain the transactions that it had up until the promotion. Any other red replicas will remain with the original cluster.

Even if the primary DB comes back, at that point your promoted RDS instance is part of a different cluster, at this point it is not possible to reverse the action. If there was any transactions difference between they will need to manually be applied.

For your application the major change is that the Database DNS name has now changed. I would advise creating or using a private route 53 hosted zone and create a CNAME record pointing to the original RDS cname. Once you have done this update your application to use the CNAME in your private hosted zone.

If you ever needed to promote the read replica you would then just need to update the CNAME value in your Route 53 to the new RDS CNAME. If you do use this, remember to keep the TTL low for your Route 53 record to ensure that failover is quick.

Alternatively, if you can use a Multi-AZ setup it will perform the promotions and failover automatically for you.

To summarise this to the answers to your 3 questions

  • Is the lagged data forever lost? - Once you promote the RDS instance the relationship between both is broken, anything not replicated will not be replicated.
  • If the primary db is up again, will there be any conflict? - No conflict from the RDS perspective, they are now two standalone RDS instances.
  • In my application do I need to do some special handling in querying the db? - Ensure your application can now communicate to the new DNS CNAME for the new RDS cluster. Try using Route 53 private hosted zones to reduce risk of using legacy CNAME.
like image 186
Chris Williams Avatar answered Oct 07 '22 19:10

Chris Williams