Restarting pods quickly

Tags:

kubernetes

I have been experimenting with kubernetes recently, and I have been trying to test the failover in pods, by having a replication controller, in which containers crash as soon as they are used (thus causing a restart).

I have adapted the bashttpd project for this: https://github.com/Chronojam/bashttpd

(Where in I have set it up so that it serves the hostname of the container, then exits)

This works great, except the restart is far to slow for what I am trying to do, as it works for the first couple of requests, then stops for a while - then starts working again when the pods are restarted. (ideally id like to see no interruption at all when accessing the service).

I think (but not sure) that the backup delay mentioned here is to blame: https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/pod-states.md#restartpolicy

some output:

#] kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
chronojam-blog-a23ak         1/1       Running   0          6h
chronojam-blog-abhh7         1/1       Running   0          6h
chronojam-serve-once-1cwmb   1/1       Running   7          4h
chronojam-serve-once-46jck   1/1       Running   7          4h
chronojam-serve-once-j8uyc   1/1       Running   3          4h
chronojam-serve-once-r8pi4   1/1       Running   7          4h
chronojam-serve-once-xhbkd   1/1       Running   4          4h
chronojam-serve-once-yb9hc   1/1       Running   7          4h
chronojam-tactics-is1go      1/1       Running   0          5h
chronojam-tactics-tqm8c      1/1       Running   0          5h
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-j8uyc </h3>
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-r8pi4 </h3>
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-yb9hc </h3>
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-46jck </h3>
#] curl http://serve-once.chronojam.co.uk
#] curl http://serve-once.chronojam.co.uk

You'll also note that even though there should be 2 still-healthy pods there, it stops returning after the 4th.

So my question is two fold:

Can I tweak the backoff delay?

Why does my service not send my request to the healthy containers?

Observations:

I think that it might be the webserver itself not being able to start serving requests that quickly, so kubernetes is reckonizing those pods as healthy, and sending requests there (but coming back with nothing because the process hasnt started?)

994

asked Dec 18 '15 22:12

Chronojam

1 Answers

I filed an issue to document the recommended practice. I put a sketch of the approach in the issue:

https://github.com/kubernetes/kubernetes/issues/20473

ensure the pods have a non-zero terminationGracePeriodSeconds set
configure a readinessProbe on the main serving container of the pods
handle SIGTERM in the application: fail the readinessProbe but continue * to handle normal requests and do not exit
set maxUnavailable and/or maxSurge large enough to ensure enough serving instances in the Deployment API spec (available in 1.2)

Container restarts, especially when they pull images, are fairly expensive for the system. The Kubelet backs off restarts of crashing containers in order to degrade gracefully with DOSing docker, the registry, the apiserver, etc.

120

answered Sep 26 '22 19:09

briangrant

Related questions
                            
                                Correctly override "settings.xml" in Jenkinsfile Maven build on kubernetes?
                            
                                nslookup can not get service ip on latest busybox
                            
                                Can I use NGINX Ingress auth with oidc?
                            
                                Nginx ingress controller not giving metrics for prometheus
                            
                                How do I create an internal gateway using Istio?
                            
                                GCP internal load balancer Global access (Beta) annotation does not work?
                            
                                How to copy files from a windows kubernetes pod container
                            
                                Fluentd error: “buffer space has too many data”
                            
                                Pulling docker image from private github package using AWS EKS with AWS Fargate
                            
                                Kubernetes Ansible Operators - Patch an Existing Kubernetes Resource
                            
                                How to enable startup probe on GKE 1.16?
                            
                                pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
                            
                                Mounting EFS in EKS cluster: example deployment fails
                            
                                Reasons for OOMKilled in kubernetes
                            
                                Implement Docker isolation for multiple users
                            
                                LoadBalancer external-ip stuck in pending
                            
                                How to access host's localhost from inside kubernetes cluster
                            
                                Unable to Copy data from POD to local using kubectl cp command
                            
                                How do I get Events associated with a Pod via the API?
                            
                                Execute command into Kubernetes pod as other user

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With