My question has two parts:
With respect to part (1): If I understand correctly, all instance modifications are made on the standby and then AWS fails over by flipping the CNAME over to the standby as the primary is updated, so if I were to make any kind of instance modification and select "apply immediately," it should cause a failover, correct?
With respect to part (2): I am looking specifically for a way of monitoring the failover of an Oracle RDS instance, whether through a lambda function, a bash script, or some other means. As far as I can tell, it is not possible to use ping with RDS, even when I allow all ICMP traffic via the security group. I can connect without trouble using telnet or an SQL client. What I would like though is some way of doing something like periodically pinging the database during a failover to see when the IP associated with the connection string switches over and how long it takes. Any suggestions?
To see if a failover has occurred, open the Amazon RDS console, and then choose Events from the navigation pane. If AWS CloudTrail logging is enabled, then you can check the logs to see whether the event was planned or unplanned. For example, scaling compute or applying pending OS upgrades can trigger a failover.
In an Amazon RDS Multi-AZ deployment, Amazon RDS automatically creates a primary database (DB) instance and synchronously replicates the data to an instance in a different AZ. When it detects a failure, Amazon RDS automatically fails over to a standby instance without manual intervention.
According to the SLA's provided by AWS, whenever an instance marked as multi-AZ goes through a failure (whether it is a network failure, disk failure, etc); AWS automatically shifts the traffic to its standby running on a separate AZ on the same AWS region.
The availability benefits of Multi-AZ deployments also extend to planned maintenance and backups. In the case of system upgrades like OS patching or DB Instance scaling, these operations are applied first on the standby, prior to the automatic failover. As a result, your availability impact is, again, only the time required for automatic failover to complete.
To simulate failover, simply reboot with failover when rebooting, instead of rebooting both. From the linked documentation:
Reboot with failover is beneficial when you want to simulate a failure of a DB instance for testing, or restore operations to the original AZ after a failover occurs.
Additional Resources:
Update on this:
I ended up using a simple bash script:
date; while true; date; do nc -vz DBNAME.REGION.rds.amazonaws.com PORT; sleep 1; done
Note: the above is for netcat-openbsd
. If using netcat-traditional
, you'll need to modify this.
This polls the database each second to see if it's still possible to connect. Typically when I ran this and then initiated reboot with failover, the connection would simply dangle during the failover then display a timeout error when the failover was complete and connectivity resumed, presumably because the failover usually takes longer than the reboot. If the reboot happens to take longer than the failover though, there may be a period of time during which the connection is refused as the reboot completes. In any case, using this method, I was able to get a consistent failover time of 2:08.
It seeems, however, that unlike I originally thought, most instance modifications do not involve a failover at all. I have tested resizing the instance as well as changing the option groups and parameter groups and did not experience any downtime.
Changing the database engine does result in a failover.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With