Kubernetes DNS intermittently failing with kube-dns service and CoreDNS pods seeming OK

Tags:

We have a Kubernetes cluster with 1 master and 3 nodes managed by kops that we use for our application deployment. We have minimal pod-to-pod connectivity but like the autoscaling features in Kubernetes. We've been using this for the past few months but recently have started having issue where our pods randomly cannot connect to our redis or database with an error like:

Set state pending error: dial tcp: lookup redis.id.0001.use1.cache.amazonaws.com on 100.64.0.10:53: read udp 100.126.88.186:35730->100.64.0.10:53: i/o timeout

OperationalError: (psycopg2.OperationalError) could not translate host name “postgres.id.us-east-1.rds.amazonaws.com” to address: Temporary failure in name resolution

What's stranger is this only occurs some of the time, then when a pod is recreated it will work again and this will trip it up shortly after.

We have tried following all of Kube's kube-dns debugging instructions to no avail, tried countless solutions like changing the ndots configuration and have even experimented moving to CoreDNS, but still have the exact same intermittent issues. We use Calico for networking but it's hard to say if it's occurring at the network level as we haven't seen issues with any other services.

Does anyone have any ideas of where else to look for what could be causing this behavior, or if you've experienced this behavior before yourself could you please share how you resolved it?

Thanks

The pods for CoreDNS look OK

⇒  kubectl get pods --namespace=kube-system
NAME                                                    READY     STATUS    RESTARTS   AGE
...
coredns-784bfc9fbd-xwq4x                                1/1       Running   0          3h
coredns-784bfc9fbd-zpxhg                                1/1       Running   0          3h
...

We have enabled logging on CoreDNS and seen requests actually coming through:

⇒  kubectl logs coredns-784bfc9fbd-xwq4x --namespace=kube-system
.:53
2019-04-09T00:26:03.363Z [INFO] CoreDNS-1.2.6
2019-04-09T00:26:03.364Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
 [INFO] plugin/reload: Running configuration MD5 = 7f2aea8cc82e8ebb0a62ee83a9771ab8
[INFO] Reloading
 [INFO] plugin/reload: Running configuration MD5 = 73a93c15a3b7843ba101ff3f54ad8327
[INFO] Reloading complete
...
2019-04-09T02:41:08.412Z [INFO] 100.126.88.129:34958 - 18745 "AAAA IN sqs.us-east-1.amazonaws.com.cluster.local. udp 59 false 512" NXDOMAIN qr,aa,rd,ra 152 0.000182646s
2019-04-09T02:41:08.412Z [INFO] 100.126.88.129:51735 - 62992 "A IN sqs.us-east-1.amazonaws.com.cluster.local. udp 59 false 512" NXDOMAIN qr,aa,rd,ra 152 0.000203112s
2019-04-09T02:41:13.414Z [INFO] 100.126.88.129:33525 - 52399 "A IN sqs.us-east-1.amazonaws.com.ec2.internal. udp 58 false 512" NXDOMAIN qr,rd,ra 58 0.001017774s
2019-04-09T02:41:18.414Z [INFO] 100.126.88.129:44066 - 47308 "A IN sqs.us-east-1.amazonaws.com. udp 45 false 512" NOERROR qr,rd,ra 140 0.000983118s
...

Service and endpoints look OK

⇒  kubectl get svc --namespace=kube-system
NAME                                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
kube-dns                                             ClusterIP   100.64.0.10      <none>        53/UDP,53/TCP   63d
...

⇒  kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                                          AGE
kube-dns   100.105.44.88:53,100.127.167.160:53,100.105.44.88:53 + 1 more...   63d
...

519

asked Apr 09 '19 04:04

Ruby

1 Answers

We also encounter this issue, but issue was with query timeout.

The best way after testing was to run dns on all nodes and all PODs referring to their own node DNS. It will save round trips to other node pods because you may run multiple pods for DNS but dns service will distribute traffic some how and PODs will end up having more network traffic across nodes. Not sure if possible on amazon eks.

122

answered Oct 12 '22 20:10

Akash Sharma

Related questions
                            
                                Accessing kubernetes dashboard gives Error trying to reach service: 'dial tcp 10.44.0.2:8443: connect: connection refused'
                            
                                whats the connection between PersistentVolume and PersistentVolumeClaims
                            
                                Kubernetes Unable to mount volumes for pod with timeout
                            
                                Add kubernetes service dns aliases
                            
                                kubernetes nginx ingress with proxy protocol ended up with broken header
                            
                                Using kubernetes init containers on a private repo
                            
                                Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring
                            
                                Unable to enter a kubernetes pod. Error from server: error dialing backend: dial tcp: lookup (node hostname) on 168.63.129.16:53: no such host
                            
                                Is there an include directive for Kubernetes yaml resource definitions?
                            
                                Kubernetes Persistent Volume and hostpath
                            
                                Kubernetes's http liveness probe failed when pod under heavy load
                            
                                JVM initial CPU spike in a Docker container
                            
                                How to set environment variable in container from Kubernetes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kubernetes DNS intermittently failing with kube-dns service and CoreDNS pods seeming OK

Tags:

kubernetes

coredns

kube-dns

Ruby

People also ask

1 Answers

Akash Sharma

Recent Activity

Donate For Us