How is the Multi-AZ deployment of Amazon RDS realized?

Tags:

amazon-rds

Recently I'm considering to use Amazon RDS Multi-AZ deployment for a service in production environment, and I've read the related documents.

However, I have a question about the failover. In the FAQ of Amazon RDS, failover is described as follows:

Q: What happens during Multi-AZ failover and how long does it take?

Failover is automatically handled by Amazon RDS so that you can resume database operations as quickly as possible without administrative intervention. When failing over, Amazon RDS simply flips the canonical name record (CNAME) for your DB Instance to point at the standby, which is in turn promoted to become the new primary. We encourage you to follow best practices and implement database connection retry at the application layer. Failover times are a function of the time it takes crash recovery to complete. Start-to-finish, failover typically completes within three minutes.

From the above description, I guess there must be a monitoring service which could detect failure of primary instance and do the flipping.

My question is, which AZ does this monitoring service host in? There are 3 possibilities: 1. Same AZ as the primary 2. Same AZ as the standby 3. Another AZ

Apparently 1&2 won't be the case, since it could not handle the situation that entire AZ being unavailable. So, if 3 is the case, what if the AZ of the monitoring service goes down? Is there another service to monitor this monitoring service? It seems to be an endless domino.

So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?

289

asked Jun 27 '12 07:06

ciphor

1 Answers

So, how is Amazon ensuring the availability of RDS in Multi-AZ deployment?

I think that the "how" in this case is abstracted by design away from the user, given that RDS is a PaaS service. A multi-AZ deployment has a great deal that is hidden, however, the following are true:

You don't have any access to the secondary instance, unless a failover occurs
You are guaranteed that a secondary instance is located in a separate AZ from the primary

In his blog post, John Gemignani mentions the notion of an observer managing which RDS instance is active in the multi-AZ architecture. But to your point, what is the observer? And where is it observing from?

Here's my guess, based upon my experience with AWS:

The observer in an RDS multi-AZ deployment is a highly available service that is deployed throughout every AZ in every region that RDS multi-AZ is available, and makes use of existing AWS platform services to monitor the health and state of all of the infrastructure that may affect an RDS instance. Some of the services that make up the observer may be part of the AWS platform itself, and otherwise hidden from the user.

I would be willing to bet that the same underlying services that comprise CloudWatch Events is used in some capacity for the RDS multi-AZ observer. From Jeff Barr's blog post announcing CloudWatch Events, he describes the service this way:

You can think of CloudWatch Events as the central nervous system for your AWS environment. It is wired in to every nook and cranny of the supported services, and becomes aware of operational changes as they happen. Then, driven by your rules, it activates functions and sends messages (activating muscles, if you will) to respond to the environment, making changes, capturing state information, or taking corrective action.

Think of the observer the same way - it's a component of the AWS platform that provides a function that we, as the users of the platform do not need to think about. It's part of AWS's responsibility in the Shared Responsibility Model.

answered Sep 23 '22 14:09

cerberus

Related questions
                            
                                AWS Lambda access to RDS outside VPC
                            
                                Adjusting for the default time-zone setting on RDS
                            
                                Cloudformation template for AmazonRDSEnhancedMonitoringRole
                            
                                Point Heroku application to AWS RDS database
                            
                                NewSQL versus traditional optimization/sharding [closed]
                            
                                Intermittently can't connect to mysql on AWS RDS (Error 2003)
                            
                                Is it possible to use Cacti to monitor MySQL on Amazon's RDS?
                            
                                How to create table in RDS database on amazon web service
                            
                                Postgres roles and users - permission denied for table
                            
                                Download MySql Backup/Snapshot from Amazon RDS
                            
                                RDS Mysql ERROR 1045 (28000): Access denied for user @IP (using password: YES)
                            
                                AWS Lambda RDS Database Connection Pooling
                            
                                Exporting a AWS Postgres RDS Table to AWS S3
                            
                                Terraform: Validation error ... Member must satisfy regular expression pattern: arn:aws:iam::
                            
                                How to connect Node Sequelize to Amazon RDS MySQL with Multi-AZ probably
                            
                                Sequelize connection timeout while using Serverless Aurora, looking for a way to increase timeout duration or retry connection
                            
                                how to connect Django in EC2 to a Postgres database in RDS?
                            
                                Migrating existing database to Amazon RDS
                            
                                Amazon AWS RDS vs EC2 with SQL Server [closed]
                            
                                AWS RDS MySQL error. scenario when restoring data generates error ERROR 1227(42000)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With