Prevent back-off in kubernetes crash loop

Tags:

kubernetes

I have a pod with some terrible, buggy software in it. One reason Kubernetes is great is that it'll just restart the software when it crashes, which is awesome.

Kubernetes was designed for good software, not terrible software, so it does an exponential backoff while restarting pods. This means I have to wait five minutes between crashes before my pods are restarted.

Is there any way to cap the kubernetes backoff strategy? I'd like to change it to not wait longer than thirty seconds before starting up the pod again.

367

asked Apr 25 '16 15:04

Riley Lark

1 Answers

Unfortunately, the max back off time for container restarts is not tunable for the node reliability (i.e., too many container restarts can overwhelm the node). If you absolutely want to change it in your cluster, you will need to modify the max backoff time in the code, compile your own kubelet binary, and distribute it onto your nodes.

answered Sep 26 '22 01:09

Yu-Ju Hong

Related questions
                            
                                Airflow scheduler fails to start with kubernetes executor
                            
                                Kubernetes Ingress Controller without Load Balancer
                            
                                kubectl logs -f gets "Authorization error"
                            
                                Singularity + Kubernetes
                            
                                DigitalOcean pod has unbound immediate PersistentVolumeClaims
                            
                                Best CD strategy for Kubernetes Deployments
                            
                                How to connect to minikube services from outside
                            
                                How to obtain the enable admission controller list in kubernetes?
                            
                                Kubernetes Nginx: How to have zero-downtime deployments?
                            
                                Kubernetes : dial tcp 127.0.0.1:8080: connect: connection refused
                            
                                Reading streaming http response with Python "requests" library
                            
                                Pods are not starting. NetworkPlugin cni failed to set up pod
                            
                                Kubernetes, security context, fsGroup field and default user's group ID running the container
                            
                                Spark on K8s - getting error: kube mode not support referencing app depenpendcies in local
                            
                                Connect to local database from inside minikube cluster
                            
                                Ansible + Kubernetes: how to wait for a Job completion
                            
                                How do you enable Feature Gates in K8s?
                            
                                Can I rely on volumeClaimTemplates naming convention?
                            
                                custom tag on EBS volume provisioned dynamically by Kubernetes
                            
                                How to use the kubernetes go-client to get the same Pod status info that kubectl gives

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With