Kubernetes recreate pod if node becomes offline timeout

Tags:

2 Answers

If Taint Based Evictions are present in the pod definition, controller manager will not be able to evict the pod that tolerates the taint. Even if you don't define an eviction policy in your configuration, it gets a default one since Default Toleration Seconds admission controller plugin is enabled by default.

Default Toleration Seconds admission controller plugin configures your pod like below:

tolerations:
- key: node.kubernetes.io/not-ready
  effect: NoExecute
  tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
  operator: Exists
  effect: NoExecute
  tolerationSeconds: 300

You can verify this by inspecting definition of your pod:

kubectl get pods -o yaml -n <namespace> <pod-name>`

According to above toleration it takes more than 5 minutes to recreate the pod on another ready node since pod can tolerate not-ready taint for up to 5 minutes. In this case, even if you set --pod-eviction-timeout to 20s, there is nothing controller manager can do because of the tolerations.

But why it takes more than 5 minutes? Because the node will be considered as down after --node-monitor-grace-period which defaults to 40s. After that, pod toleration timer starts.

Available Options

If you want to be in control of these timings below you will find plenty of options to do so. However, modifying these options should be avoided. If you use tight timings which might create an overhead on etcd as every node will try to update its status very often.

In addition to this, currently it is not clear how to propagate changes in controller manager, api server and kubelet configuration to all nodes in a living cluster. Please see Tracking issue for changing the cluster and Dynamic Kubelet Configuration. As of this writing, reconfiguring a node's kubelet in a live cluster is in beta.

You can configure control plane and kubelet during kubeadm init or join phase. Please refer to Customizing control plane configuration with kubeadm and Configuring each kubelet in your cluster using kubeadm for more details.

Assuming you have a single node cluster:

controller manager includes:
- --node-monitor-grace-period default 40s
- --node-monitor-period default 5s
- --pod-eviction-timeout default 5m0s
api server includes:
- --default-not-ready-toleration-seconds default 300
- --default-unreachable-toleration-seconds default 300
kubelet includes:
- --node-status-update-frequency default 10s

If you set up the cluster with kubeadm you can modify:

/etc/kubernetes/manifests/kube-controller-manager.yaml for controller manager options.
/etc/kubernetes/manifests/kube-apiserver.yaml for api server options.

Note: Modifying these files will reconfigure and restart the respective pod in the node.

In order to modify kubelet config you can add below line:

KUBELET_EXTRA_ARGS="--node-status-update-frequency=10s"

To /etc/default/kubelet (for DEBs), or /etc/sysconfig/kubelet (for RPMs) and then restart kubelet service:

sudo systemctl daemon-reload && sudo systemctl restart kubelet

answered Oct 22 '22 05:10

Root G

This is what happens when node dies or go into offline mode:

The kubelet posts its status to masters by --node-status-update-fequency=10s.
Node goes offline
kube-controller-manager is monitoring all the nodes by --node-monitor-period=5s
kube-controller-manager will see the node is unresponsive and has the grace period --node-monitor-grace-period=40s until it considers node unhealthy. PS: This parameter should be in N x node-status-update-fequency
Once the node marked unhealthy, the kube-controller-manager will remove the pods based on --pod-eviction-timeout=5m

Now, if you tweaked the parameter pod-eviction-timeout to say 30 seconds, it will still take

 node status update frequency: 10s
 node-monitor-period: 5s
 node-monitor-grace-period: 40s
 pod-eviction-timeout: 30s

Total 70 seconds to evict the pod from node The node-status-update-fequecy and node-monitor-grace-period time counts in node-monitor-grace-period also. You can tweak these variable as well to further lower down your total node eviction time.

This is my kube-controller-manager.yaml (present at /etc/kubernetes/manifests for kubeadm) file:

containers:
  - command:
    - kube-controller-manager
    - --controllers=*,bootstrapsigner,tokencleaner
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --pod-eviction-timeout=30s
    - --address=127.0.0.1
    - --use-service-account-credentials=true
    - --kubeconfig=/etc/kubernetes/controller-manager.conf

I am effectively seeing my pods get evicted in 70s once I turn off my node.

EDIT2:

Run following command on master and check that the --pod-eviction-timeout comes as 20s.

[root@ip-10-0-1-12]# docker ps --no-trunc | grep "kube-controller-manager"

9bc26f99dcfe6ac0e7b2abf22bff67af6060561ee8c0cdff08e11c3a479f182c   sha256:40c8d10b2d11cbc3db2e373a5ffce60dd22dbbf6236567f28ac6abb7efbfc8a9                     
"kube-controller-manager --leader-elect=true --use-service-account-credentials=true --root-ca-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key \
**--pod-eviction-timeout=30s** --address=127.0.0.1 --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --service-account-private-key-file=/etc/kubernetes/pki/sa.key --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --allocate-node-cidrs=true --cluster-cidr=192.168.13.0/24 --node-cidr-mask-size=24"

If here --pod-eviction-timeout is 5m and not 20s then your changes are not applied properly.

answered Oct 22 '22 07:10

Prafull Ladha

Related questions
                            
                                Printing not being logged by Kubernetes
                            
                                Does a completed kubernetes pod still reserves the required resources?
                            
                                adding single quotes to helm value
                            
                                All Kubernetes proxy targets down - Prometheus Operator
                            
                                How to debug Kubernetes CreateContainerConfigError
                            
                                Multiple K8S containers connecting to Google Cloud SQL through proxy
                            
                                Get Deployment annotation from a Kubernetes Pod
                            
                                Cannot deploy MySQL pod. --initialize specified but the data directory has files in it
                            
                                kubelet.service: Main process exited, code=exited, status=255/n/a
                            
                                Using Horizontal Pod Autoscaling along with resource requests and limits
                            
                                Pass multiple variables in helm template
                            
                                What is the difference between Kubernetes and Amazon ECS
                            
                                How to access GKE kubectl proxy dashboard?
                            
                                kubectl pod fails to pull down an AWS ECR image
                            
                                what is the relationship between EXPOSE in the dockerfile and TARGETPORT in the service YAML and actual running port in the Pod?
                            
                                Schedule cron job to never happen?
                            
                                Set vm.max_map_count on cluster nodes
                            
                                Kubernetes NGINX Ingress configmap 301 redirect
                            
                                Kubernetes pods disappear after failed jobs
                            
                                Kubernetes Secret is persisting through deletes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kubernetes recreate pod if node becomes offline timeout

Tags:

kubernetes

kube-controller-manager

Jure Potocnik

People also ask

2 Answers

Recommended Solution

Available Options

Root G

Prafull Ladha

Recent Activity

Donate For Us