What happens when the Kubernetes master fails?

Tags:

I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?

According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?

465

asked Aug 26 '16 17:08

David Newswanger

1 Answers

In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.

In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:

Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests

In a partially degraded state, the API server may be able to respond to requests that only read data.

However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.

There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.

If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).

answered Sep 30 '22 23:09

pnovotnak

Related questions
                            
                                What is the difference between ReplicaSet and ReplicationController?
                            
                                Helm how to define .Release.Name value
                            
                                kubernetes cannot pull local image
                            
                                Using etcd as primary store/database?
                            
                                How to run containers sequentially as a Kubernetes job?
                            
                                Updating kubernetes helm values
                            
                                Kubernetes Helm, combine two variables with a string in the middle
                            
                                Ansible Galaxy roles install in to a specific directory?
                            
                                What's the meaning of "READY=2/2" output by command "kubectl get pod $yourpod"
                            
                                Kubernetes check serviceaccount permissions
                            
                                How to SSH to docker container in kubernetes cluster? [closed]
                            
                                How do I force delete kubernetes pods?
                            
                                Kubernetes Persistent Volume Claim Indefinitely in Pending State
                            
                                How to run command after initialization
                            
                                What is hyperkube?
                            
                                How can I edit a Deployment without modify the file manually?
                            
                                Mount local directory into pod in minikube
                            
                                How to force SSL for Kubernetes Ingress on GKE
                            
                                Can I have multiple values.yaml files for Helm
                            
                                Nginx Ingress: service "ingress-nginx-controller-admission" not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What happens when the Kubernetes master fails?

Tags:

kubernetes

openshift-origin

David Newswanger

People also ask

1 Answers

pnovotnak

Recent Activity

Donate For Us