I have a C# application that fetches data from SQL Server hosted in a somewhat flaky environment. There is nothing I can do to address the environmental issues, so I need to handle them as gracefully as possible.
To do so, I want to retry operations that are the result of infrastructure failures such as network glitches, SQL servers going off-line because they're being rebooted, query time-outs etc. At the same time, I don't want to retry queries if they've failed for logical errors. I just want those to bubble the exception up to the client.
My question is this: what is the best way to distinguish between environmental problems (lost connections, time-outs) and other kinds of exceptions (things like logical errors that would have happened even if the environment was stable).
Is there a commonly used pattern in C# for dealing with things like this? For example, is there a property I can check on the SqlConnection object to detect failed connections? If not, what is the best way to approach this problem?
For what it is worth, my code isn't anything special:
using (SqlConnection connection = new SqlConnection(myConnectionString)) using (SqlCommand command = connection.CreateCommand()) { command.CommandText = mySelectCommand; connection.Open(); using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { // Do something with the returned data. } } }
A transient error, also known as a transient fault, has an underlying cause that soon resolves itself. An occasional cause of transient errors is when the Azure system quickly shifts hardware resources to better load-balance various workloads. Most of these reconfiguration events finish in less than 60 seconds.
Not closing connections could cause timeouts as the connection pool may run out of available connections that can be used. A side point to this. If you use the using keyword, that will automatically close the connection for you.
If connection pooling is off, the transaction is rolled back after SqlConnection. Close is called. Transactions started through System. Transactions are controlled through the System.
One single SqlException
(may) wraps multiple SQL Server errors. You can iterate through them with Errors
property. Each error is SqlError
:
foreach (SqlError error in exception.Errors)
Each SqlError
has a Class
property you can use to roughly determine if you can retry or not (and in case you retry if you have to recreate connection too). From MSDN:
Class
< 10 is for errors in information you passed then (probably) you can't retry if first you don't correct inputs.Class
from 11 to 16 are "generated by user" then probably again you can't do anything if user first doesn't correct his inputs. Please note that class 16 includes many temporary errors and class 13 is for deadlocks (thanks to EvZ) so you may exclude these classes if you handle them one by one.Class
from 17 to 24 are generic hardware/software errors and you may retry. When Class
is 20 or higher you have to recreate connection too. 22 and 23 may be serious hardware/software errors, 24 indicates a media error (something user should be warned but you may retry in case it was just a "temporary" error).You can find a more detailed description of each class here.
In general if you handle errors with their class you won't need to know exactly each error (using error.Number
property or exception.Number
which is just a shortcut for first SqlError
in that list). This has the drawback that you may retry when it's not useful (or error can't be recovered). I'd suggest a two steps approach:
SELECT * FROM master.sys.messages
) to see what you want to handle (knowing how). That view contains messages in all supported languages so you may need to filter them by msglangid
column (for example 1033 for English).Class
is 13 or higher than 16 (and reconnecting if 20 or higher).One word about higher classes. How to handle these errors isn't simple and it depends on many factors (including risk management for your application). As a simple first step I wouldn't retry for 22, 23, and 24 when attempting a write operation: if database, file system or media are seriously damaged then writing new data may deteriorate data integrity even more (SQL Server is extremely careful to do not compromise DB for a query even in critical circumstances). A damaged server, it depends on your DB network architecture, might even be hot-swapped (automatically, after a specified amount of time, or when a specified trigger is fired). Always consult and work close to your DBA.
Strategy for retrying depends on error you're handling: free resources, wait for a pending operation to complete, take an alternative action, etc. In general you should retry only if all errors are "retry-able":
bool rebuildConnection = true; // First try connection must be open for (int i=0; i < MaximumNumberOfRetries; ++i) { try { // (Re)Create connection to SQL Server if (rebuildConnection) { if (connection != null) connection.Dispose(); // Create connection and open it... } // Perform your task // No exceptions, task has been completed break; } catch (SqlException e) { if (e.Errors.Cast<SqlError>().All(x => CanRetry(x))) { // What to do? Handle that here, also checking Number property. // For Class < 20 you may simply Thread.Sleep(DelayOnError); rebuildConnection = e.Errors .Cast<SqlError>() .Any(x => x.Class >= 20); continue; } throw; } }
Wrap everything in try
/finally
to properly dispose connection. With this simple-fake-naive CanRetry()
function:
private static readonly int[] RetriableClasses = { 13, 16, 17, 18, 19, 20, 21, 22, 24 }; private static bool CanRetry(SqlError error) { // Use this switch if you want to handle only well-known errors, // remove it if you want to always retry. A "blacklist" approach may // also work: return false when you're sure you can't recover from one // error and rely on Class for anything else. switch (error.Number) { // Handle well-known error codes, } // Handle unknown errors with severity 21 or less. 22 or more // indicates a serious error that need to be manually fixed. // 24 indicates media errors. They're serious errors (that should // be also notified) but we may retry... return RetriableClasses.Contains(error.Class); // LINQ... }
Some pretty tricky ways to find list of non critical errors here.
Usually I embed all this (boilerplate) code in one method (where I can hide all the dirty things done to create/dispose/recreate connection) with this signature:
public static void Try( Func<SqlConnection> connectionFactory, Action<SqlCommand> performer);
To be used like this:
Try( () => new SqlConnection(connectionString), cmd => { cmd.CommandText = "SELECT * FROM master.sys.messages"; using (var reader = cmd.ExecuteReader()) { // Do stuff } });
Please note that skeleton (retry on error) can be used also when you're not working with SQL Server (actually it can be used for many other operations like I/O and network related stuff so I'd suggest to write a general function and to reuse it extensively).
You can simply SqlConnectionStringBuilder properties to sql connection retry.
var conBuilder = new SqlConnectionStringBuilder(Configuration["Database:Connection"]); conBuilder.ConnectTimeout = 90; conBuilder.ConnectRetryInterval = 15; conBuilder.ConnectRetryCount = 6;
Note:- Required .Net 4.5 or later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With