Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pods not moved on host failure

Tags:

kubernetes

i have setup myself a simple 1 master and 3 nodes setup running on Ubuntu based on the book "Kuberenetes Up & Running" in combination with the official documentation.

It basically works until i shutdown one of the worker nodes. After a few seconds the nodes-running-state switches to unknown. The pods keep report the state running even if the pods are located on the offline node.

Shouldn't k8s move these pods to a different healthy host? Am i missing something?

thanks in advice!

like image 413
thepill Avatar asked Dec 23 '22 19:12

thepill


1 Answers

With Kubernetes version 1.13 and higher, pod eviction on node failures/not-ready conditions is actually controlled by taints and tolerations. --pod-eviction-timeout parameter is not used anymore.

When a node goes down or is not ready, node-controller/kubelet will add the following taints to the node - node.kubernetes.io/unreachable and node.kubernetes.io/not-ready. All pods tolerate these taints for 300 seconds by default. You can control this toleration time cluster wide for all pods with flags to kube-api-server and also per pod using tolerations object in pod spec.

Cluster Wide configuration:

You can modify the toleration time cluster wide using --default-not-ready-toleration-seconds and --default-unreachable-toleration-seconds flags to kube-api-server.

From docs:

--default-not-ready-toleration-seconds int     Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute that is added by default to every pod that does not already have such a toleration.
--default-unreachable-toleration-seconds int     Default: 300

Per pod configuration:

You can also modify the toleration time per pod using the following configuration.

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 120
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 120

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions

like image 145
Shashank V Avatar answered Jan 04 '23 02:01

Shashank V