How to tell from Npgsql exception if the call is worth a retry (transient fault strategy)

Tags:

I'm writing a service which will be connecting to a remote postgres server. I'm looking for a good way to determine which exceptions should be treated as transient (worth retrying), and how to define an appropriate policy for connecting to a remote database.

The service is using Npgsql for the data access. The documentation says that Npgsql will throw a PostgresException for sql errors and an NpgsqlException for "server related issues".

So far the best I have been able to come up with is to assume all exceptions that are not PostgresExceptions should be treated as possibly transient, worth retrying, but a PostgresException would mean that there is something wrong with the query and that retrying would not help. Am I correct in this assumption?

I am using Polly to create a Retry and Circuit Breaker policy. Thus, my policy looks like this:

Policy.Handle<Exception>( AllButPotgresExceptions()) // if its a postgres exception we know its not going to work even with a retry, so don't
                       .WaitAndRetryAsync(new[]
                       {
                           TimeSpan.FromSeconds(1),
                           TimeSpan.FromSeconds(2),
                           TimeSpan.FromSeconds(4)
                       }, onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
                    .WrapAsync(
                           Policy.Handle<Exception>( AllButPotgresExceptions())
                               .AdvancedCircuitBreakerAsync(
                                   failureThreshold:.7, 
                                   samplingDuration: TimeSpan.FromSeconds(30), 
                                   minimumThroughput: 20, 
                                   durationOfBreak: TimeSpan.FromSeconds(30), 
                                   onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "), 
                                   onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "), 
                                   onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
                               )));
        }
    }

    private static Func<Exception, bool> AllButPotgresExceptions()
    {
        return ex => ex.GetType() != typeof(PostgresException);
    }

Is there a better way to determine which errors may be transient?

UPDATE:

Following Shay's suggestions I opened a new issue in Npgsql and updated my Policy to look like this:

public static Policy PostresTransientFaultPolicy
    {
        get
        {
            return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
                       .WaitAndRetryAsync(
                            retryCount: 10, 
                            sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4), 
                            onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
                    .WrapAsync(
                           Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
                               .AdvancedCircuitBreakerAsync(
                                   failureThreshold:.4, 
                                   samplingDuration: TimeSpan.FromSeconds(30), 
                                   minimumThroughput: 20, 
                                   durationOfBreak: TimeSpan.FromSeconds(30), 
                                   onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "), 
                                   onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "), 
                                   onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
                               )));
        }
    }

    private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
    {
        //TODO add random %20 variance on the exponent
        return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
    }

    private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
    {
        return (ex) =>
        {                
            //if it is not a postgres exception we must assume it will be transient
            if (ex.GetType() != typeof(PostgresException))
                return true;

            var pgex = ex as PostgresException;
            switch (pgex.SqlState)
            {
                case "53000":   //insufficient_resources
                case "53100":   //disk_full
                case "53200":   //out_of_memory
                case "53300":   //too_many_connections
                case "53400":   //configuration_limit_exceeded
                case "57P03":   //cannot_connect_now
                case "58000":   //system_error
                case "58030":   //io_error

                //These next few I am not sure whether they should be treated as transient or not, but I am guessing so

                case "55P03":   //lock_not_available
                case "55006":   //object_in_use
                case "55000":   //object_not_in_prerequisite_state
                case "08000":   //connection_exception
                case "08003":   //connection_does_not_exist
                case "08006":   //connection_failure
                case "08001":   //sqlclient_unable_to_establish_sqlconnection
                case "08004":   //sqlserver_rejected_establishment_of_sqlconnection
                case "08007":   //transaction_resolution_unknown
                    return true;
            }

            return false;
        };
    }

545

asked Mar 15 '17 23:03

Dominick O'Dierno

1 Answers

Your approach is good. NpgsqlException usually means a network/IO error, although you can examine the inner exception and check for IOException to be sure.

PostgresException is thrown when PostgreSQL reports an error, which in most cases is a problem with the query. However, there may be some transient server-side issues (e.g. too many connections), you can examine the SQL error code for that - see the PG docs.

It may be a good idea to add an IsTransient property to these exceptions, encoding these checks inside PostgreSQL itself - you're welcome to open an issue for that on the Npgsql repo.

answered Sep 20 '22 12:09

Shay Rojansky

Related questions
                            
                                C# uwp launch apps
                            
                                Are there drawbacks to the generic-implementing-non-generic-interface pattern?
                            
                                ASP.NET MVC - Create action link preserve query string
                            
                                C# - Setting XML Node values as Stings from StreamReader result
                            
                                Generic type from base interface [duplicate]
                            
                                How do I get rid of this error caused by MVCAttribute routing?
                            
                                How to insert a value in a different position via ReadLine in C#?
                            
                                What's the best practice to using the static variable in C# of Unity
                            
                                How to validate a certificate chain from a specific root CA in C#
                            
                                ASP.NET MVC PagedList using AJAX in partial view
                            
                                How can I get the Primary Key for the DbSet?
                            
                                Why does EF 6 ignore applied migrations if I move the Migrations folder?
                            
                                Return a Task instead of awaiting the inner method call [duplicate]
                            
                                Calling Asp.Net Web API endpoint from Azure function
                            
                                How to handle exception in using(Py.GIL()) block pythonnet
                            
                                difference of float/double conversion betwen VS2012 and VS2015
                            
                                Why are locks performed on separate objects? [duplicate]
                            
                                How can I find all the public fields of an object in C#?
                            
                                Generics with Generic Parameters and Abstract class
                            
                                Correlation of two arrays in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to tell from Npgsql exception if the call is worth a retry (transient fault strategy)

Tags:

c#

postgresql

microservices

npgsql

polly

Dominick O'Dierno

People also ask

1 Answers

Shay Rojansky

Recent Activity

Donate For Us