I want to implement graceful shutdown in a Kubernetes Pod. I know I need to listen for SIGTERM, which indicates the start of the shutdown procedure. But what exactly do I do when I receive it?
At the very least I must wait for all running request to finish before exiting. But can the pod still receive new requests after receiving the SIGTERM? (It's exposed using a service.) I can't find any explicit documentation on this.
The docs state:
Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly can continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.
So that seems to imply that new requests can still come in. So how long should I continue to expect new requests before graceful termination? Do I simply ignore the SIGTERM, continue to serve requests as usual and wait for the eventual SIGKILL?
I suppose ensuring future readiness checks fail and then waiting longer than the period with which they occur before terminating might work?
I'm on Kubernetes 1.2.5, if that makes any difference, and am talking about rolling updates in particular, but also scaling replication controllers down generally.
SIGTERM Signal - Kubernetes will then send the SIGTERM signal to the pod, which will initiate the 30-second process of shutting down the pod, saving all data while it does.
If the readiness probe fails, the endpoints controller removes the Pod's IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure . If a container does not provide a readiness probe, the default state is Success .
A restarting container can indicate problems with memory (see the Out of Memory section), cpu usage, or just an application exiting prematurely. If a container is being restarted because of CPU usage, try increasing the requested and limit amounts for CPU in the pod spec.
Without a service, Pods are assigned an IP address which allows access from within the cluster. Other pods within the cluster can hit that IP address and communication happens as normal.
I recently faced similar problem, I used simple preStop hook, which introduces some delay(sleep) between start of termination and receiving SIGTERM to underlying process
lifecycle:
preStop:
exec:
command:
- "sleep"
- "60"
This delay helps,
Load balancer to remove(sync) the pod being terminated
Gives chance to terminating pod to complete requests received before termination
Fulfill the requests received by terminating pod between termination and load balancer update(sync)
PreStop can be made more intelligent for pod with unpredictable time of serving
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With