Azure SQL failover group, what does the grace period mean?

Tags:

I am currently reading this: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group, and I have a hard time understanding the automatic failover policy:

By default, a failover group is configured with an automatic failover policy. The SQL Database service triggers failover after the failure is detected and the grace period has expired. The system must verify that the outage cannot be mitigated by the built-in high availability infrastructure of the SQL Database service due to the scale of the impact. If you want to control the failover workflow from the application, you can turn off automatic failover.

When defining the failover group in an ARM template:

{
  "condition": "[equals(parameters('redundancyId'), 'pri')]",
  "type": "Microsoft.Sql/servers",
  "kind": "v12.0",
  "name": "[variables('sqlServerPrimaryName')]",
  "apiVersion": "2014-04-01-preview",
  "location": "[parameters('location')]",
  "properties": {
    "administratorLogin": "[parameters('sqlServerPrimaryAdminUsername')]",
    "administratorLoginPassword": "[parameters('sqlServerPrimaryAdminPassword')]",
    "version": "12.0"
  },
  "resources": [
    {
      "condition": "[equals(parameters('redundancyId'), 'pri')]",
      "apiVersion": "2015-05-01-preview",
      "type": "failoverGroups",
      "name": "[variables('sqlFailoverGroupName')]",
      "properties": {
        "serverName": "[variables('sqlServerPrimaryName')]",
        "partnerServers": [
          {
            "id": "[resourceId('Microsoft.Sql/servers/', variables('sqlServerSecondaryName'))]"
          }
        ],
        "readWriteEndpoint": {
          "failoverPolicy": "Automatic",
          "failoverWithDataLossGracePeriodMinutes": 60
        },
        "readOnlyEndpoint": {
          "failoverPolicy": "Disabled"
        },
        "databases": [
          "[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]"
        ]
      },
      "dependsOn": [
        "[variables('sqlServerPrimaryName')]",
        "[resourceId('Microsoft.Sql/servers/databases', variables('sqlServerPrimaryName'), variables('sqlDatabaseName'))]",
        "[resourceId('Microsoft.Sql/servers', variables('sqlServerSecondaryName'))]"
      ]
    },
    {
      "condition": "[equals(parameters('redundancyId'), 'pri')]",
      "name": "[variables('sqlDatabaseName')]",
      "type": "databases",
      "apiVersion": "2014-04-01-preview",
      "location": "[parameters('location')]",
      "dependsOn": [
        "[variables('sqlServerPrimaryName')]"
      ],
      "properties": {
        "edition": "[variables('sqlDatabaseEdition')]",
        "requestedServiceObjectiveName": "[variables('sqlDatabaseServiceObjective')]"
      }
    }
  ]
},
{
  "condition": "[equals(parameters('redundancyId'), 'pri')]",
  "type": "Microsoft.Sql/servers",
  "kind": "v12.0",
  "name": "[variables('sqlServerSecondaryName')]",
  "apiVersion": "2014-04-01-preview",
  "location": "[variables('sqlServerSecondaryRegion')]",
  "properties": {
    "administratorLogin": "[parameters('sqlServerSecondaryAdminUsername')]",
    "administratorLoginPassword": "[parameters('sqlServerSecondaryAdminPassword')]",
    "version": "12.0"
  }
}

I specify the readWriteEndpoint like this:

    "readWriteEndpoint": {
      "failoverPolicy": "Automatic",
      "failoverWithDataLossGracePeriodMinutes": 60
    }

With a failoverWithDataLossGracePeriodMinutes set to 60 minutes.

What does this mean? I cannot find a clear answer anywhere. Does it mean that:

When an outage is happening in my primary region where my primary database resides, the read/write endpoint points to the primary and only after 60 minutes it fails over to my secondary, which becomes the new primary. In the 60 minutes, the only way to read my data is to use the readOnlyEndpoint directly? OR
My read/write endpoint is turned instantly, if they somehow can detect that there was no data to be synced

I think it boils down to: do I have to manually make the failover, if I detect an outage, if I don't care about data loss, but I want to be able to write to my database?

Bonus question: is the reason why the grace period is present because there can be unsynced data on the primary, that will be overwritten, or tossed away, if the secondary becomes the new primary (if i switch manually)?

Sorry, I can't keep it to only one question. I have read a lot and I really need to know this.

218

asked Jun 15 '19 17:06

mslot

Video Answer

1 Answers

What does this mean?

It means that:

"when a outage is happening in my primary region where my primary database resides, the read/write endpoint points to the primary and only after 60 minutes it fails over to my secondary, which becomes the new primary. "

It can't failover automatically even when the data is synced because the high-availability solution in the primary region is trying to do the same thing, and almost all of the time your primary database will come back quickly in the primary region. Performing an automatic cross-region fail-over would interfere with this.

And

"the reason why the grace period is present, is that because the there can be unsynced data on the primary, that will be overwritten, or tossed away, if the secondary becomes the new primary"

And to allow time for the database to failover within the primary region.

answered Oct 22 '22 23:10

David Browne - Microsoft

Related questions
                            
                                Slow write of database using `mysqldump `
                            
                                Running Selenium in Azure Function
                            
                                Issue while getting the latest record by time from cosmos database
                            
                                Is there a way to programmatically create an Azure ServiceBus Topic
                            
                                SqlQuerySpec QueryBuilder for CosmosDb
                            
                                Service Fabric services debugging issues
                            
                                Deployment for Service Fabric service version upgrade fails on VSTS Release
                            
                                How to use Avro on HDInsight Spark/Jupyter?
                            
                                Azure AD B2C: Bad Request - Request Too Long HTTP Error 400. The size of the request headers is too long. After login
                            
                                Get Azure resource group creation time
                            
                                Azure Function deployment slow
                            
                                Azure DevOps API Add public key
                            
                                azure function service bus output message properties
                            
                                dynamic_template_data doesn't work with sendgrid and azure function integration
                            
                                Getting tenant name from azure CLI
                            
                                How to configure Azure Service Bus Queues to push messages to clients without polling?
                            
                                Request.HttpContext.Connection.ClientCertificate is always null
                            
                                Azure blob storage throttling
                            
                                Pass parameter to Azure CLI Task in DevOps
                            
                                Azure purge script from Azure DevOps

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Azure SQL failover group, what does the grace period mean?

Tags:

azure-sql-database

azure

automatic-failover

mslot

People also ask

Video Answer

1 Answers

David Browne - Microsoft

Recent Activity

Donate For Us