Kubernetes DNS no longer resolving names

Tags:

I have a cluster consisting of 6 servers, 3 masters and 3 workers. Up to this morning everything worked fine, until I removed two workers from the cluster.

Now the internal DNS is not working anymore. I cannot resolve an internal name. Apparently google.com is resolved and I can ping it.

My cluster is running Kubernetes V1.18.2 (calico for networking), installed with kubespray. I can access my services from outside, but when it's time for them to connect to each other, they fail (for example when the UI tries to connect to the database).

I provide below some of the output from the command listed here: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

kubectl exec -ti busybox-6899b748d7-pbdk4 -- cat /etc/resolv.conf


nameserver 10.233.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ovh.net
options ndots:5

kubectl exec -ti busybox-6899b748d7-pbdk4 -- nslookup kubernetes.default

Server:         10.233.0.10
Address:        10.233.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

command terminated with exit code 1

kubectl exec -ti busybox-6899b748d7-pbdk4 -- nslookup google.com

Server:         10.233.0.10
Address:        10.233.0.10:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.22.142

*** Can't find google.com: No answer

kubectl exec -ti busybox-6899b748d7-pbdk4 -- ping google.com

PING google.com (172.217.22.142): 56 data bytes
64 bytes from 172.217.22.142: seq=0 ttl=52 time=4.409 ms
64 bytes from 172.217.22.142: seq=1 ttl=52 time=4.359 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 4.359/4.384/4.409 ms

kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

NAME                       READY   STATUS    RESTARTS   AGE
coredns-74b594f4c6-5k6kq   1/1     Running   2          6d7h
coredns-74b594f4c6-9ct8x   1/1     Running   0          16m

when I get the logs for the DNS pods: for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done they are full of:

E0522 11:56:22.613704       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: net/http: TLS handshake timeout
E0522 11:56:33.678487       1 reflector.go:307] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to watch *v1.Service: Get https://10.233.0.1:443/api/v1/services?allowWatchBookmarks=true&resourceVersion=1667490&timeout=8m12s&timeoutSeconds=492&watch=true: dial tcp 10.233.0.1:443: connect: connection refused
E0522 12:19:42.356157       1 reflector.go:307] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to watch *v1.Namespace: Get https://10.233.0.1:443/api/v1/namespaces?allowWatchBookmarks=true&resourceVersion=1667490&timeout=6m39s&timeoutSeconds=399&watch=true: dial tcp 10.233.0.1:443: connect: connection refused
E0522 12:19:42.356327       1 reflector.go:307] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to watch *v1.Service: Get https://10.233.0.1:443/api/v1/services?allowWatchBookmarks=true&resourceVersion=1667490&timeout=6m41s&timeoutSeconds=401&watch=true: dial tcp 10.233.0.1:443: connect: connection refused

The coredns service is up: kubectl get svc --namespace=kube-system


NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
coredns                     ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   7d4h
dashboard-metrics-scraper   ClusterIP   10.233.52.242   <none>        8000/TCP                 7d4h
kubernetes-dashboard        ClusterIP   10.233.63.42    <none>        443/TCP                  7d4h
voyager-operator            ClusterIP   10.233.31.206   <none>        443/TCP,56791/TCP        6d5h

The endpoints are exposed: kubectl get ep coredns --namespace=kube-system

NAME      ENDPOINTS                                                    AGE
coredns   10.233.68.9:53,10.233.79.7:53,10.233.68.9:9153 + 3 more...   7d4h

What did I break? How can I fix this?

EDIT: More information requested in the comments: kubectl get pods -n kube-system

NAME                                          READY   STATUS    RESTARTS   AGE
calico-kube-controllers-5d9cfb4bfd-8h7jd      1/1     Running   0          3d14h
calico-node-6w8g6                             1/1     Running   13         4d15h
calico-node-78thq                             1/1     Running   6          7d19h
calico-node-cr4jl                             1/1     Running   23         4d16h
calico-node-g5q99                             1/1     Running   1          3d15h
calico-node-pmss2                             1/1     Running   0          3d15h
calico-node-zw9fk                             1/1     Running   18         4d19h
coredns-74b594f4c6-5k6kq                      1/1     Running   2          6d22h
coredns-74b594f4c6-9ct8x                      1/1     Running   0          15h
dns-autoscaler-7594b8c675-j5jfv               1/1     Running   0          15h
kube-apiserver-kub1                           1/1     Running   42         7d20h
kube-apiserver-kub2                           1/1     Running   1          7d19h
kube-apiserver-kub3                           1/1     Running   33         7d19h
kube-controller-manager-kub1                  1/1     Running   37         7d20h
kube-controller-manager-kub2                  1/1     Running   4          3d15h
kube-controller-manager-kub3                  1/1     Running   55         7d19h
kube-proxy-4dlf8                              1/1     Running   4          4d15h
kube-proxy-4nlhf                              1/1     Running   2          4d15h
kube-proxy-82kkz                              1/1     Running   3          4d15h
kube-proxy-lvsfz                              1/1     Running   0          3d15h
kube-proxy-pmhnx                              1/1     Running   4          4d15h
kube-proxy-wpfnn                              1/1     Running   10         4d15h
kube-scheduler-kub1                           1/1     Running   34         7d20h
kube-scheduler-kub2                           1/1     Running   3          7d19h
kube-scheduler-kub3                           1/1     Running   51         7d19h
kubernetes-dashboard-7dbcd59666-79gxv         1/1     Running   0          3d14h
kubernetes-metrics-scraper-6858b8c44d-g9m9w   1/1     Running   1          5d22h
nginx-proxy-galaxy                            1/1     Running   2          4d15h
nginx-proxy-kub4                              1/1     Running   7          4d19h
nginx-proxy-kub5                              1/1     Running   6          4d16h
nodelocaldns-2dv59                            1/1     Running   0          3d15h
nodelocaldns-9skxm                            1/1     Running   5          4d16h
nodelocaldns-dwg4z                            1/1     Running   4          4d15h
nodelocaldns-nmwwz                            1/1     Running   12         7d19h
nodelocaldns-qkq8n                            1/1     Running   4          4d19h
nodelocaldns-v84jj                            1/1     Running   8          7d19h
voyager-operator-5677998d47-psskf             1/1     Running   10         4d15h

561

asked May 25 '20 13:05

Paul

1 Answers

I was able to reproduce the scenario.

$ kubectl exec -it busybox -n dev -- nslookup kubernetes.default    
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

command terminated with exit code 1
$ kubectl exec -it busybox -n dev -- nslookup google.com        
Server:         10.96.0.10
Address:        10.96.0.10:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.168.238

*** Can't find google.com: No answer

$ kubectl exec -it busybox -n dev -- ping google.com    
PING google.com (172.217.168.238): 56 data bytes
64 bytes from 172.217.168.238: seq=0 ttl=52 time=18.425 ms
64 bytes from 172.217.168.238: seq=1 ttl=52 time=27.176 ms
64 bytes from 172.217.168.238: seq=2 ttl=52 time=18.603 ms
64 bytes from 172.217.168.238: seq=3 ttl=52 time=15.445 ms
64 bytes from 172.217.168.238: seq=4 ttl=52 time=16.492 ms
64 bytes from 172.217.168.238: seq=5 ttl=52 time=19.294 ms
^C
--- google.com ping statistics ---
6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 15.445/19.239/27.176 ms

But I followed the same steps using dnsutils image. Which has mentioned in Kubernetes doc. It gives a positive response.

$ kubectl exec -ti dnsutils -n dev -- nslookup kubernetes.default   
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1


$ kubectl exec -ti dnsutils -n dev -- nslookup google.com        
Server:         10.96.0.10
Address:        10.96.0.10#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.168.238
Name:   google.com
Address: 2a00:1450:400e:80c::200e

As per my understanding, something wrong with the dnsutils in busybox container here. That's why we're getting this DNS resolve error.

answered Nov 04 '22 15:11

hariK

Related questions
                            
                                Zookeeper: Hostname resolution fails
                            
                                kubernetes error syncing pod - how to debug
                            
                                GKE with Ingress setup always gives status UNHEALTHY
                            
                                Kubernetes service discovery doesn't resolve service host on minikube
                            
                                Configuring PodSecurityPolicy on a kubeadm cluster
                            
                                Any way to prevent k8s pod eviction?
                            
                                UDP send and receive in kubernetes
                            
                                Kubernetes livenessProbe: restarting vs destroying of the pod
                            
                                s3 proxy on kubernetes using Ingress
                            
                                Kubernetes's Ingress annotations for x509 certificate authentificate
                            
                                Accessing Kubernetes API via Kubernetes Dashboard Host
                            
                                Is this possible to schedule CronJob to execute on each of Kubernetes nodes?
                            
                                UnknownHostException in kubernetes
                            
                                Is it possible to deploy a GUI application using Kubernetes?
                            
                                Difference between kubectl and minikube-kubectl
                            
                                How does Kubernetes Horizontal Pod Autoscaler calculate CPU Utilization for Multi Container Pods?
                            
                                Custom helm chart requires another stable/chart
                            
                                How to configure Cloudflare app in Grafana from code?
                            
                                Can't connect to MariaDB by hostname within a Kubernetes cluster
                            
                                Kubernetes AntiAffinity - limit max number of same pods per node

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kubernetes DNS no longer resolving names

Tags:

dns

kubernetes

project-calico

kubespray

Paul

People also ask

1 Answers

hariK

Recent Activity

Donate For Us