Which HTTP status code should I use for a health-check failure?

Tags:

I'm implementing a /_status/ endpoint which does some sanity checks on data in our database.

For example, we are collecting measurements and the status should go "bad" if the latest measurement is over an hour old.

I would like to point Pingdom at this URL to leverage their alerting infrastructure and tell us when something's wrong.

On a "good" status I will serve an HTML page with an HTTP 200 OK status. But what would an appropriate HTTP status code be for "bad"? Or would it be more correct not to convey this information via status code, but via HTML content instead?

Thanks!

945

asked Aug 19 '14 17:08

Paul M Furley

4 Answers

Well... this is an old question, but I ended up here, so I thought I'd give my two cents here: It seems pretty clear that a 2xx should be returned if all is OK

If health is not OK, I think it should return a 5xx result (4xx talks about the client being at fault in the request; 2xx and 3xx are all successful to some degree).

I think that a 5xx is correct because this is a special request that is answering about the state of the whole service. Also, because most Load Balancers offer liveliness checks based on response codes and not all offer a way to parse a more complex payload (other than perhaps a RegExp Match which can make the check brittle).

I agree with @Julien that a 500 (specifically) doesn't seem appropriate, and we've decided on 503 Service Unavailable.

503 seems to fit for a couple of reasons:

It's a 5xx family result code which indicates that something is going on on the server side.
It has a temporary nature to it indicating that it may recover.

189

answered Oct 04 '22 04:10

Paolo

We just had a similar discussion in our group. We decided for our purposes that the HTTP response codes should be reporting on your server's success or failure to honor the request. For a GET, this would mean whether or not you can respond with the requested resource. In this case, the requested resource is a health report, so as long as you're returning that successfully, it should be a 200 response.

We're returning JSON for our health check, with a top-level "isHealthy" field set to true or false. Our load balancer and other monitors will parse the JSON and use this field to determine if the system is healthy or not.

If you don't want to parse JSON in your monitors, you could try putting a custom response header to indicate binary health of the system, e.g., System-Health: true or System-Health: false. You might have better luck getting monitors which can check that.

If you really want to use a response code, I would recommend an additional endpoint called something like "health" which returns a "204 No Content" when healthy, and a "404 Not Found" when not healthy. In this case, the resource defined by the URL is, symbolically, the health of your system, and so if it's healthy, you can return a successful response. If it's unhealthy, then it's health can't be found, hence the 404.

answered Oct 04 '22 03:10

brianmearns

If your data is 'bad' because there is a service failure (even if that is a backend job failing) then a HTTP 500 seems like a valid response. It indicates that something, somewhere is broken.

It isn't very specific, you're shrugging your shoulders and saying:

The 500 (Internal Server Error) status code indicates that the server encountered an unexpected condition that prevented it from fulfilling the request.

ietf rfc7231

answered Oct 04 '22 05:10

Ken

If you ask for health and the server state is not healthy, I'm partial to 409 Conflict which "Indicates that the request could not be processed because of conflict in the current state of the resource" .

Some people might object that if you can respond then the request can be processed, but I disagree. Every error message is a response. The server defines resource semantics. If you ask for the good news resource and the server responds "here is bad news", it didn't give you what it defines to have offered at that resource.

In practice, it's much easier to say 2**="up" 4**="down" and pipe request counts into an availability metric and have a load balancer remove the server from its pool based on the response code. Coming up with ways to argue that "hey, we told you something, so 200 OK" just seems like missing the forrest for the trees to me.

answered Oct 04 '22 05:10

bwtaylor

Related questions
                            
                                HTTP Status Code for Captcha
                            
                                Stop processing Flask route if request aborted
                            
                                dns prefetch / pre-resolve hostname - how effective?
                            
                                What does a plus sign mean in a http url? -> http://+:80 [duplicate]
                            
                                HTTP statuscode to retry same request
                            
                                In the HTTP CORS spec, what's the difference between Allow-Headers and Expose-Headers?
                            
                                Is it more efficient to store the permissions of the user in an JWT claim or to check it on the server at every request?
                            
                                Google Adwords CSP (content security policy) img-src
                            
                                Which encoding is used by the HTTP protocol?
                            
                                How to check for unrestricted Internet access? (captive portal detection)
                            
                                Are character set names case-sensitive in HTTP?
                            
                                Rack::Request - how do I get all headers?
                            
                                X-Cache Header Explanation
                            
                                oauth2.0 how to pass access token
                            
                                Two way sync for cookies between HttpURLConnection (java.net.CookieManager) and WebView (android.webkit.CookieManager)
                            
                                java.net.HttpRetryException: cannot retry due to server authentication, in streaming mode
                            
                                Angular2 http.post gets executed twice
                            
                                Setting a custom HTTP header dynamically with Spring-WS client
                            
                                How to check whether user has internet connection or not in Angular2?
                            
                                Web API 2 POST request simulation in POSTMAN Rest Client

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which HTTP status code should I use for a health-check failure?

Tags:

http

web-applications

monitoring