I am currently reading this: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group, and I have a hard time understanding the automatic failover policy:
By default, a failover group is configured with an automatic failover policy. The SQL Database service triggers failover after the failure is detected and the grace period has expired. The system must verify that the outage cannot be mitigated by the built-in high availability infrastructure of the SQL Database service due to the scale of the impact. If you want to control the failover workflow from the application, you can turn off automatic failover.
When defining the failover group in an ARM template:
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"type": "Microsoft.Sql/servers",
"kind": "v12.0",
"name": "[variables('sqlServerPrimaryName')]",
"apiVersion": "2014-04-01-preview",
"location": "[parameters('location')]",
"properties": {
"administratorLogin": "[parameters('sqlServerPrimaryAdminUsername')]",
"administratorLoginPassword": "[parameters('sqlServerPrimaryAdminPassword')]",
"version": "12.0"
},
"resources": [
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"apiVersion": "2015-05-01-preview",
"type": "failoverGroups",
"name": "[variables('sqlFailoverGroupName')]",
"properties": {
"serverName": "[variables('sqlServerPrimaryName')]",
"partnerServers": [
{
"id": "[resourceId('Microsoft.Sql/servers/', variables('sqlServerSecondaryName'))]"
}
],
"readWriteEndpoint": {
"failoverPolicy": "Automatic",
"failoverWithDataLossGracePeriodMinutes": 60
},
"readOnlyEndpoint": {
"failoverPolicy": "Disabled"
},
"databases": [
"[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]"
]
},
"dependsOn": [
"[variables('sqlServerPrimaryName')]",
"[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]",
"[resourceId('Microsoft.Sql/servers', variables('sqlServerSecondaryName'))]"
]
},
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"name": "[variables('sqlDatabaseName')]",
"type": "databases",
"apiVersion": "2014-04-01-preview",
"location": "[parameters('location')]",
"dependsOn": [
"[variables('sqlServerPrimaryName')]"
],
"properties": {
"edition": "[variables('sqlDatabaseEdition')]",
"requestedServiceObjectiveName": "[variables('sqlDatabaseServiceObjective')]"
}
}
]
},
{
"condition": "[equals(parameters('redundancyId'), 'pri')]",
"type": "Microsoft.Sql/servers",
"kind": "v12.0",
"name": "[variables('sqlServerSecondaryName')]",
"apiVersion": "2014-04-01-preview",
"location": "[variables('sqlServerSecondaryRegion')]",
"properties": {
"administratorLogin": "[parameters('sqlServerSecondaryAdminUsername')]",
"administratorLoginPassword": "[parameters('sqlServerSecondaryAdminPassword')]",
"version": "12.0"
}
}
I specify the readWriteEndpoint like this:
"readWriteEndpoint": {
"failoverPolicy": "Automatic",
"failoverWithDataLossGracePeriodMinutes": 60
}
With a failoverWithDataLossGracePeriodMinutes set to 60 minutes.
What does this mean? I cannot find a clear answer anywhere. Does it mean that:
I think it boils down to: do I have to manually make the failover, if I detect an outage, if I don't care about data loss, but I want to be able to write to my database?
Bonus question: is the reason why the grace period is present because there can be unsynced data on the primary, that will be overwritten, or tossed away, if the secondary becomes the new primary (if i switch manually)?
Sorry, I can't keep it to only one question. I have read a lot and I really need to know this.
A failover group is a named group of databases managed by a single server that can fail over as a unit to another Azure region in case all or some primary databases become unavailable due to an outage in the primary region.
Use SQL Server Management Studio Expand the Always On High Availability node and the Availability Groups node. Right-click the availability group to be failed over, and select Failover.
During the failover, the failover target takes over the primary role, recovers its databases, and brings them online as the new primary databases. The former primary replica, when available, switches to the secondary role, and its databases become secondary databases.
This process normally should complete within 30 seconds or less. There's a short period during which both databases are unavailable, on the order of 0 to 25 seconds, while the roles are switched.
What does this mean?
It means that:
"when a outage is happening in my primary region where my primary database resides, the read/write endpoint points to the primary and only after 60 minutes it fails over to my secondary, which becomes the new primary. "
It can't failover automatically even when the data is synced because the high-availability solution in the primary region is trying to do the same thing, and almost all of the time your primary database will come back quickly in the primary region. Performing an automatic cross-region fail-over would interfere with this.
And
"the reason why the grace period is present, is that because the there can be unsynced data on the primary, that will be overwritten, or tossed away, if the secondary becomes the new primary"
And to allow time for the database to failover within the primary region.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With