Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I address sudden loss of connection to SQL Azure in my Azure role?

My Azure role grabs stuff to process from a database - it holds an instance of System.Data.SqlClient.SqlConnection and periodically creates an SqlCommand instance and executes an SQL query.

Now once in a while (usually once in several days) running a query will trigger an SqlException exception

The service has encountered an error processing your request. Please try again. Error code 40143. A severe error occurred on the current command. The results, if any, should be discarded.

Which I've already seen many times and now my code catches it, calls Dispose() on the SqlConnection instance and then reopens the connection and retries the query. The latter typically results in another SqlException exception

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

Which looks pretty much like SQL Azure server not responding or being unavailable for whatever reason.

Currently my code doesn't catch the latter exception, it is propagated outside RoleEntryPoint.Run() and the role is restarted. Restart typically takes about ten minutes and once it completes the problem is gone for a day or so.

I don't like my role restarting - it's takes a while and my service functionality is hindered. I'd like to do something smarter.

What would be a strategy to address this problem? Should I retry the query several times and how many times and with what interval? Should I do something else? When do I give up and let the role just restart?

like image 344
sharptooth Avatar asked Aug 10 '11 06:08

sharptooth


2 Answers

I would strongly recommend you have a look at the Transient Fault Handling Framework for SQL Azure

This will help you handle retry logic for both connection and query attempts, I am using this in production and it works great. There is also a nice article on technet that might be of some use.

[EDIT: 17 Oct 2013]

It looks like this has been picked up by the patterns and practices team at The Transient Fault Handling Application Block

like image 196
David Steele Avatar answered Nov 19 '22 19:11

David Steele


We use TransientFaultHandling and it doesn't handle all of the strange exceptions.

For example, this one popped up yesterday:

The service has encountered an error processing your request. Please try again. Error code 40143. A severe error occurred on the current command. The results, if any, should be discarded. , stacktrace at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning() at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, . . .

The reasonable approach that will work even with this:

  1. Identify a coarse-grained pseudo-transaction where the call happens.
  2. Wrap this block in a try-catch.
  3. on exception, 'roll back' the pseudo-transaction.

Example of a typical workflow:

  • A get Azure queue message
  • B query data from SQL Azure
  • C process data,
  • D upload results
  • E delete message.

Wrap B through C together in a try-catch. If something happens during 'harmless' SQL Azure call, simply bail out without deleting the message, it will simply pop up again after visibility timeout expires.

Actually, this is very common approach: organize into transaction-like blocks, wrap block into try-catch, neatly roll back on exception. And never, never assume that some calls do not fail. All call fail from time to time.

like image 29
Sergey Malgin Avatar answered Nov 19 '22 21:11

Sergey Malgin