Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the Multi-AZ deployment of Amazon RDS realized?

Tags:

amazon-rds

Recently I'm considering to use Amazon RDS Multi-AZ deployment for a service in production environment, and I've read the related documents.

However, I have a question about the failover. In the FAQ of Amazon RDS, failover is described as follows:

Q: What happens during Multi-AZ failover and how long does it take?

Failover is automatically handled by Amazon RDS so that you can resume database operations as quickly as possible without administrative intervention. When failing over, Amazon RDS simply flips the canonical name record (CNAME) for your DB Instance to point at the standby, which is in turn promoted to become the new primary. We encourage you to follow best practices and implement database connection retry at the application layer. Failover times are a function of the time it takes crash recovery to complete. Start-to-finish, failover typically completes within three minutes.

From the above description, I guess there must be a monitoring service which could detect failure of primary instance and do the flipping.

My question is, which AZ does this monitoring service host in? There are 3 possibilities: 1. Same AZ as the primary 2. Same AZ as the standby 3. Another AZ

Apparently 1&2 won't be the case, since it could not handle the situation that entire AZ being unavailable. So, if 3 is the case, what if the AZ of the monitoring service goes down? Is there another service to monitor this monitoring service? It seems to be an endless domino.

So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?

like image 289
ciphor Avatar asked Jun 27 '12 07:06

ciphor


People also ask

How Amazon RDS implements the Multi-AZ deployment?

How it works. In an Amazon RDS Multi-AZ deployment, Amazon RDS automatically creates a primary database (DB) instance and synchronously replicates the data to an instance in a different AZ. When it detects a failure, Amazon RDS automatically fails over to a standby instance without manual intervention.

What happens when an RDS master database in a multi-AZ deployment goes down?

If a storage volume on your primary instance fails in a Multi-AZ deployment, Amazon RDS automatically initiates a failover to the up-to-date standby (or to a replica in the case of Amazon Aurora).

What happens when I convert my RDS instance from single-AZ to multi-AZ?

When you change your Single-AZ instance to Multi-AZ, you don't experience any downtime on the instance. During the modification, Amazon RDS creates a snapshot of the instance's volumes. Then, this snapshot is used to create new volumes in another Availability Zone.

What would happen to an RDS relational database service multi availability Zone?

What would happen to an RDS (Relational Database Service) Multi-Availability Zone deployment if the primary DB instance fails? IP of the primary DB Instance is switched to the standby DB Instance. A new DB instance is created in the standby availability zone.


1 Answers

So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?

I think that the "how" in this case is abstracted by design away from the user, given that RDS is a PaaS service. A multi-AZ deployment has a great deal that is hidden, however, the following are true:

  • You don't have any access to the secondary instance, unless a failover occurs
  • You are guaranteed that a secondary instance is located in a separate AZ from the primary

In his blog post, John Gemignani mentions the notion of an observer managing which RDS instance is active in the multi-AZ architecture. But to your point, what is the observer? And where is it observing from?

Here's my guess, based upon my experience with AWS:

The observer in an RDS multi-AZ deployment is a highly available service that is deployed throughout every AZ in every region that RDS multi-AZ is available, and makes use of existing AWS platform services to monitor the health and state of all of the infrastructure that may affect an RDS instance. Some of the services that make up the observer may be part of the AWS platform itself, and otherwise hidden from the user.

I would be willing to bet that the same underlying services that comprise CloudWatch Events is used in some capacity for the RDS multi-AZ observer. From Jeff Barr's blog post announcing CloudWatch Events, he describes the service this way:

You can think of CloudWatch Events as the central nervous system for your AWS environment. It is wired in to every nook and cranny of the supported services, and becomes aware of operational changes as they happen. Then, driven by your rules, it activates functions and sends messages (activating muscles, if you will) to respond to the environment, making changes, capturing state information, or taking corrective action.

Think of the observer the same way - it's a component of the AWS platform that provides a function that we, as the users of the platform do not need to think about. It's part of AWS's responsibility in the Shared Responsibility Model.

like image 56
cerberus Avatar answered Sep 23 '22 14:09

cerberus