My main goal is to avoid a big delay in the health status updating to "Critical" when I can predict my service will go down. I would combine this with the http health check already in place.
Considered solutions:
I have tried a TTL check, but this introduces the burden of converting the service to constantly send it's current status.
Using a TTL check with a really high ttl + sending "healthy" once on restart came to mind, but if this initial request fails the service stays unhealthy way too long.
Reducing the interval of my http health-check would mitigate the problem a bit, but also creates a bigger overhead.
If you can predict the service will go down, you should consider putting it into maintenance mode. This will remove it from DNS and API results immediately. Here is the link to the documentation on how to put service into the maintenance mode.
Health checks will always have a delay, since they are executed periodically and are designed to monitor service for unexpected downtime. Best way to have minimal impact on users, if you know service is going down due to update/upgrade/reboot/decomissioning, is to remove it before doing any work on it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With