Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polly + API Services That Return Errors as Results

I'm working with a web API that will return code 404 if querying a data that doesn't exist, or other errors if the data is malformed of there's some other problem. Which then results in an HttpRequestException.

Now I'm thinking about a detail. I'm using Polly on that HttpClient connection to ensure it retries in case of communication problems.

In this case, will it work as expected, or will Polly keep retrying in the case of server-thrown errors like "not found" or "bad request"?

I'm configuring it like this

services.AddHttpClient<OntraportHttpClient>()
    .AddTransientHttpErrorPolicy(p =>
        p.WaitAndRetryAsync(3, _ => TimeSpan.FromMilliseconds(600)));
like image 534
Etienne Charland Avatar asked Jul 03 '26 00:07

Etienne Charland


1 Answers

You have a bit of misunderstanding, 400 Bad Request or 404 Not Found will not result in HttpRequestException.

Unless you call EnsureSuccessStatusCode explicitly.

AddTransientHttpErrorPolicy will check the followings:

  • 408 Timeout
  • 5xx Server error
  • HttpRequestException

So as you can see neither 400, 404, nor 429 Too Many Requests (typical response code in case of back-pressure) will cause your Polly policy to be triggered. Unless you explicitly call EnsureSuccessStatusCode method.


UPDATE: Adding DELETE use case

Use Case

Let's suppose we have a REST service which exposes a removal functionality of a given resource (addressed by a particular URL and via the DELETE HTTP verb).

This removal can end up in one of the 3 different states from consumption point of view:

  • Succeeded
  • Already Done
  • Failed

You can find several arguments on the internet which is the correct state for succeeded. It can either 200 (OK) with body or 204 (No Content) without body or 202 (Accepted) if it is asynchronous. Sometimes 404 (Not Found) is also used.

The already done state can occur when you try to delete an already deleted item. Without soft deletion it is hard to tell that the given resource has ever existed before or it was never been part of your system. If you have soft deletion, then the service could return 404 for an already deleted resource and 400 (Bad Request) for an unknown resource.

Whenever something fails during the request processing then it can be treated as temporary or permanent failure. If there is a network issue then it can be considered as a temporary/transient issue (this can be manifested as HttpRequestException). If there is a database outage and the service is able to detect it then it can fail fast and return with a 5XX response or it can try to fail over. If there are too many pending requests then the service may consider to throttle them and use back-pressure to shed the load. It might return with 429 (Too Many Requests) along with the appropriate Retry-After header.

Permanent errors, like service has been shut down forever or active refusal of network connection attempts under TLS 1.3 need human intervention to fix them.

Idempotency

Whenever we are talking about retry pattern we need to consider the followings:

  • The potentially introduced observable impact is acceptable
  • The operation can be redone without any irreversible side effect
  • The introduced complexity is negligible compared to the promised reliability

The second criteria is usually referred as Idempotency. It says that if you call the method / endpoint multiple times with the same input then it should return the same output without any side effect.

If your service's removal functionality can be considered as idempotent then there is no such state as Already done. If you call it 100 times then it should always return with "yepp, that's gone". So with this is mind it might make sense to return with either 204 or 404 in case of idempotent deletion.

Resilient strategy

Whenever we are talking about strategy it means for me a chain of resilient policies. If a former policy could not "fix" the problem then the latter would try to do that (so there is a policy escalation).

Server-side: You can use Bulk-head policy to have control over the maximum number of concurrent calls but if the threshold has been exceeded then you can start to throttle requests.

Client-side: You can have a timeout for each individual request and you can apply retry policy in case of temporary/transient failure. You can also define a global timeout for all your retry attempts. Or you can apply a circuit breaker to monitor successive failures and back-off for a given period of time if the service is treated as overwhelmed or malfunctioning.

My 2 cents is applying a single resilient policy on the client-side might not be enough to have a robust and resilient system. It might require several policies (on both sides) to establish a communication protocol for problematic periods.

like image 56
Peter Csala Avatar answered Jul 04 '26 14:07

Peter Csala



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!